FIT honours projects listing

Ranking Semantic Relationships Between Entities in Microblogs (18 or 24 pts)
Supervisors:
Mark Carman and Yuan-Fang Li

The immense quantity of information provided by microblogging services as well as recent improvements in the Semantic Web inspire us to explore ways of mining the former using the tools of the latter. The increase in popularity and size of microblogging services has resulted in an overload of information and a need for effective methods to browse and explore microblog messages. To tackle this overload, users need guidance and recommendations in how to find relevant pieces of information. By identifying relevant semantic paths (relationships) between entities present in individual microblog posts, we hope to support future efforts in semantic-search for this medium.

Aim and Outline
In this project, we would like to continue our investigation on a statistical approach to the problem of ranking relevant paths between entities in a microblog post. Making use of the Linked Data infrastructure and existing entity extraction systems, we propose and evaluate two novel path ranking algorithms that are based on co-occurrence frequency. The effectiveness of the proposed method will be evaluated on a large Twitter dataset.

URLs and References
B. Aleman-Meza et al. "Ranking complex relationships on the semantic web." Internet Computing, IEEE 9.3 (2005): 37-44.

F. Abel, I. Celik, G.-J. Houben, and P. Siehndel. Leveraging the semantics of tweets for adaptive faceted search on twitter. In International Semantic Web Conference (1), volume 7031 of Lecture Notes in Computer Science, pages 1-17. Springer, 2011.

I. Celik, F. Abel, and G.-J. Houben. Learning semantic relationships between entities in twitter. In S. Auer, O. Díaz, and G. A. Papadopoulos, editors, ICWE, volume 6757 of Lecture Notes in Computer Science, pages 167-181. Springer, 2011.

K. Bontcheva and D. Rout. Making sense of social media streams through semantics: a survey. Semantic Web Journal, 2012.


Applying Lean to Distribution (24 pts)
Supervisors:
Yen Cheung, Vincent Lee, Rabi Gunaratnam (Timstock Ltd)

Pioneered by Toyota, lean manufacturing has been applied in many companies since the 1990s with the aim of improving business performance by reducing waste. Although the idea of lean processes started in Toyota in the 1940s, this concept only became popular in the 1990s in response to the worldwide recession at the time. Today, lean processing or lean ‘thinking' is applied not only in the automotive industry but other industries as well. Since the recent global financial crisis, companies are seeking to be leaner organisations to remain competitive.

Aim and Outline
This research project involves applying lean ‘thinking' to a distribution centre that plans and delivers goods to customers located mainly in Victoria. Currently the company relies on a combination of both manual and software enabled processes for planning and delivery of their products, which are prone to errors and waste.

The expected outcomes of this project are:

* A proposal to the company for improving the current business processes;
* Implementation of the business improvement plan;
* Evaluation of the business improvement plan.

URLs and References
"The benefits of Lean Manufaturing: what lean thinking has to offer process industries", Melton T, Chemical Engineering Research and Design, 83(A6), 662-673, June 2005.

http://mimesolutions.com/PDFs/WEB%20Trish%20Melton%20Lean%20Manufacturing%20July%202005.pdf

Pre- and Co-requisite Knowledge
Students who have achieved at least an overall of D or higher in their prior degree and interest in applying business process improvements are encouraged to apply. There is a scholarship attached to this project provided by Timstock Ltd.


ANZ Project: Develop a community model for ANZ corporate centre (24 pts)
Supervisors:
Vincent Lee and Yen Cheung, Adam Hart (ANZ)

ANZ has previously operated as separate lines of business. As we seek to leverage our capabilities across the group, we seek to optimise processes and working relationships among our various teams.

ANZ banks want to find ways of measuring the relationships among our communities, see how informal processes and information flows differ from the formal processes, and adapt our organisation to function more efficiently.

Aim and Outline
Model the communities of practice in Finance and HR and Risk (bus and IT) to establish currently active permissions, prohibition and obligations. The expected outcome is a Model the communities of practice in Finance and HR and Risk (bus and IT) to establish currently active permissions, prohibition and obligations.

The expected outcome is a testable community model in banking services environment.

URLs and References
http://www.nehta.gov.au/implementation-resources/ehealth-foundations/EP-1144-2007

Pre- and Co-requisite Knowledge
Students who have achieved at least D in ebusiness or equivalent units; and Persistence and active listening, some familiarity with normative corporate role sand responsibilities.


ANZ Project: Customer Experience of the future - Unassisted Channel (for two students) (24 pts)
Supervisors:
David Taniar, Vincent Lee, Colin Dinn (ANZ) and Tim Liddelow (ANZ/SAP Team Partner)

As ANZ expands across the Asia Pacific region, we seek to provide a seamless and consistent customer experience, and minimise duplication of effort, by leveraging common capabilities.

In seeking to build enterprise capability for our digital channels, we must balance conflicting needs:

  • Culture: Whilst maintaining a consistent brand and experience across markets, we need to meet local language, cultural and customer behaviour expectations.
  • Regulations: We must comply with global and local regulations in each market in which we operate.
  • Maturity: In developed markets, customers expect a high degree of functionality and differentiation (and are prepared to pay for a premium product and experience). In developing markets, customers are looking for more basic products and services at reasonable cost.
  • Scale: In markets where we have a significant presence, we achieve economies of scale. In smaller markets, we need to scale our operations down to achieve profitability at low revenue levels.

ANZ banks therefore seek to build a capability that can be leveraged across the region in an economically and technically sustainable way.

Aim and Outline
Define the digital Customer Experience of the future, and build a working prototype on SAP Sybase software.
We want to consider innovative ways of (i) creating controls within a digital development environment, (ii) defining the regional asset as a level of business standardisation, (iii) defining the controls for the regional asset, (iv) defining how we would manage the localisation to ensure integrity in the regional asset. Benefits include:

  • Significant exposure to the Asia context of technology, banking, and commerce.
  • Potentially part of the engagement will be in Singapore at ANZ cost.

Expected outcome: Working Prototype on SAP Sybase software

Pre- and Co-requisite Knowledge
Digital exposure and customer experience techniques, possibly customer centred design. Development skills, especially hybrid.


The managerial use of mobile business intelligence (18 pts)
Supervisors:
Caddie Gao and David Arnott

Both mobile technology and business intelligence (BI) have been rated by the Gartner Global CIO Survey as top technology priorities for organisations in the last few years. Although many managers have incorporated mobile devices into their work routine for decision-making tasks, little research has been conducted to explore how managers use BI on mobile devices.

Aim and Outline
This project aims to investigate how managers use mobile BI for their decision-making tasks. It will be exploratory in nature and can be investigated through different theoretical lenses (including but not limited to task-technology fit, unified theory of acceptance and use of technology).

The project can be undertaken by multiple students with each student using a different method, for example, one student could use a survey method, another a lab-based approach, and another a case study.

Students working on this project will be required to be part of the DSS Lab. This includes attendance and participation in weekly seminars.

Pre- and Co-requisite Knowledge
To tackle this project students need an undergraduate degree in IT (preferrably in business information systems) or be a student in the Master of Business Information Systems.


Can managers effectively use business analytics? (18 pts)
Supervisors:
David Arnott and Caddie Gao

Business analytics (BA) is currently the boom area of business intelligence (BI) - the use of IT to support management decision-making. BI is rated by industry analyst Gartner as the top technology for chief information officers worldwide. Most BI vendors are agressively marketing BA software, especially predictive analytics. These vendors assume that managers will understand the statistical techniques used by their software. Conversations with CIOs and other senior IT executives have indicated that managers in large organizations do not always have this knowledge.

Aim and Outline
The aim of the project is to investigate whether or not business managers have the requisite background knowledge to effectively use BA systems.

The project can use either a survey or a laboratory experiment. It may be possible for two students to tackle the project at the same time, each using a different research method.

Students working on this project will be required to be part of the DSS Lab. This includes attendance and participation in weekly seminars.

Pre- and Co-requisite Knowledge
To tackle this project students need an undergraduate degree in IT (preferrably in business information systems) or be a student in the Master of Business Information Systems.


ANZ Project: Interactive Visualization Techniques for ANZ Data (24 pts)
Supervisors: Tim Dwyer

Traditionally, "big data" was stored in relational database management systems that required the data to be modeled in tabular form. Graph databases are an exciting new storage technology that is at the heart of technologies like Google's "Knowledge Graph" and Facebook's "Graph Search". However, graph databases are also causing the industry to rethink how it models and stores all kinds of rich, heterogeneous, interlinked data.

This paradigm shift is causing people to think about their data in new ways. For example, query languages like SPARQL and Gremlin allow data querying and manipulation in terms of graph traversals instead of table joins. These expert programming languages are evolving quickly, but tools for non-experts are lagging behind.

Aim and Outline
To us, thinking of data in terms of graphs opens up exciting new opportunities for dynamic interfaces to allow people to explore their data visually. This project will explore graph visualization techniques and fluid user interfaces for enabling these scenarios. Graphics, Visualization, Human Computer Interaction, User Experience Design and Layout Algorithms and Techniques will all play a part in this project.

Pre- and Co-requisite Knowledge
We are probably most interested in developing HTML5 based UI's so some experience with HTML and javascript would be helpful. Experience or passion for novel interface design is also desirable.


Porting snob to Weka (18 or 24 pts)
Supervisors:
Peter Tischer and David Albrecht

Snob is a program for unsupervised classification, also known as clustering, that was developed by Porfessor Chris Wallace , the foundation professor of Computer Science at Monash University. Given a body of observations about things, Snob tries to determine how the things can best be grouped into clusters or classes. (The term 'snob' is not an acronymn. A snob is a person who makes class distinctions.) Unfortunately, Snob is not widely-known or used. Over the years a number of variants of Snob have been developed including a parallel implementation and a version which incorporates factor analysis.

WEKA (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato. WEKA is free software available under the GNU General Public License. WEKA provides a limited range of functionality for clustering.

Aim and Outline
The aim of this project is to make a version of Snob available to a general computing audience by porting it to WEKA. This might be done by providing a wrapper to an existing Snob implemenation or it might be done by translating or re-implementing snob in Java. The project has scope to move into areas like how to change or improve aspects of the Snob implementation or how to compare different unsupervised classification algorithms in a standard testbed environment.

URLs and References
en.wikipedia.org/wiki/Weka_(machine learning)

www.datamining.monash.edu.au/software/index.shtml
Pre- and Co-requisite Knowledge
Knowledge of JAVA or a similar programming language may be useful.


Business Intelligence and Data Warehousing Governance (24 pts)
Supervisors: David Arnott, Rob Meredith

IT governance is an area of considerable academic and industry. Business intelligence (BI) systems have been rated by the Gartner Global CIO Survey as a top technology priority for organizations for the last five years. There has been very little research conducted on the governance of BI systems and the data warehouses (DW). This project builds on preliminary Monash research on BI and DW governance.

This project will investigate how BI and DW are governed, whether they use different governance strategies to enterprise systems, and whether they themselves need different strategies. The foundation theories for the project will be the matrix and contingency theories of governance. The project can be undertaken by multiple students, with each student using a different method, for example, one student could use a survey method and another a case study approach.

The project is suitable for bachelor honours and masters honours students who have an information systems orientation. Studies in business and management would be helpful but are not required.


Biometric Cryptosystem (24 pts) Supervisors: Nandita Bhattacharjee, Bala Srinivasan

Biometric information can serve not only for access or authentication, but also for data protection. It is possible to generate a key from the biometric information that can be used with some cryptographic algorithm to encipher and decipher data. In this project we shall study methods of generating key from fingerprints or iris codes, addressing the fundamental problems of biometric cryptography of key change and distortion tolerance. Given the biometrics, a set of reliable features is extracted. In this project we shall design a system in which we do not need to store a biometric template, but only a string of error-correction data from which the biometric cannot be derived, and from which the key cannot be derived either unless the biometric is present.

References

Uludag, U.; Pankanti, S.; Prabhakar, S.; Jain, A.K.;"Biometric cryptosystems: issues and challenges", Proceedings of the IEEE, Volume 92, Issue 6, June 2004 Page(s):948 - 960.

Kanade S, Petrovska-Delacretaz D and Dorizzi B, "Cancelable iris biometrics and using error correction codes to reduce variability in biometric data", IEEE conference on Computers Vision and Pattern Recognition, (CVPR), 2009, Pages: 120-127


Profiling and Predicting Users' Tweets on Twitter (18 or 24 pts)
Supervisor: Mark Carman

Background info. explaining the project context
Micro-blogging websites such as Twitter contain a large amount of valuable information regarding the time varying interests of their users. To know which musical artists or political themes are "hot" at the moment we need only to  look  at  their  level  of  discussion  in  the  Twitter  stream.

Project aim and basic outline of approach
The aim of this project is to model individual users' message streams using statistical (language and topic) modelling techniques. The intention  is  to  determine  to  what  extent  it  is  possible  to  identify  user  interests and to predict the topic of future tweets.

URLs and bibliographic ref's to background reading
This paper looks at profiling users based on information in their query log

Pre- and co-requisite knowledge and units studied as appropriate
Good understanding of maths/stats.


Investigating the Role of Ecosystem Engineers in an Agent-Based Evolutionary Simulation (24 pts)
Supervisors: Alan Dorin, Kevin Korb

The purpose of this project is to build and visualise a simple, evolving, virtual ecosystem that supports the emergence of ecosystem engineers based upon the existing work of a previous student. Physical ecosystem engineers physically alter their biotic or abiotic environment and thereby control or modulate the availability of resources to (or forces acting on) other organisms. These physical changes destroy, maintain or create habitat for other organisms [1]. Their presence is often  a  key  factor  in  ecosystem  behaviour.  A  tree  is  an  example  of  a significant physical ecosystem engineer: it provides habitat for mosses, insects and birds; its roots trap soil and leaf matter, altering the impact of wind and water erosion; its branches harbour larvae or tadpoles  within  pools  etc.  Coral  produces  reefs,  wombats  dig  holes,  lyrebirds  and  blackbirds  sift leaf litter. These species (and humans!) are physical ecosystem engineers that have a large impact on organisms around them.

To date, little research has been done employing agent-based modelling techniques (techniques that simulate individual intelligent agents and their interactions) to study the emergence of ecosystem engineers in virtual ecosystems. This project will investigate how physical ecosystem engineering impacts  on:
  • habitats
  • the number of niches an ecosystem supports
  • the number of trophic levels in an ecosystem
  • species diversity,
  • ecosystem resilience and stability over evolutionary time periods.

It appears that no generic simulations exploring these properties of ecosystems over evolutionary time periods have yet been devised. In fact, as yet there seem to be no simulations investigating the emergence of ecosystem engineers at all [2]. Do they evolve readily and under what circumstances? Is there a basic organizational property of ecosystems that requires them? This is an exciting opportunity to apply your computer science to a real-world biological problem in need of serious study. A thoroughly completed Honours project in this area would likely result in a paper publication in an international conference or journal.

It would be of considerable benefit if students engaged on this project enrolled in the honours unit FIT4012 Advanced Topics in Computational Science.Experience with computer graphics programming in C++ is an advantage.

[1] Gutiérrez, J.L. and C.G. Jones, Physical Ecosystem Engineers as Agents of Biogeochemical Heterogeneity. BioScience, 2006. 56(3): p. 227-237

[2] Dorin A., Korb K.B. & Grimm V., "Artificial-Life Ecosystems: What are they and what could they become?", In Proceedings of the Eleventh International Conference on Artificial Life, S. Bullock, J. Noble, R. A. Watson, and M. A. Bedau (Eds.), MIT Press, Cambridge, MA. 2008, pp.173-180 [pdf paper]


Multitouch Interactive Parameter Search for a Visual Agent-based Model of Bee Foraging (24 pts)
Supervisors: Alan Dorin, Michael Wybrow, Nic Geard, Adrian Dyer

Agent-based models (ABMs) are simulations that operate from the “bottom up”. That is, they explicitly represent individual components (termed agents) that interact with one another and their environment to generate “emergent” phenomena at the group level. For instance, a population of individual birds (the agents) may be simulated and visualised to generate a flock and its associated behaviour (the emergent phenomenon). An agent’s behaviour at any moment during ABM simulation is dictated by its current state. State is often unique to each agent and may be influenced by that agent’s particular life history and perception of its local environmental conditions. Thus, ABMs maintain the basic principles that each individual in a population has unique behavioural and physiological qualities resulting from genetic and environmental influences; and that interactions between organisms are local – each individual is affected primarily by its local environment including by other organisms in its proximity [Huston et al 1988].

In this project you will design novel multitouch-based interactive tools for exploring complex parameter spaces typical of agent-based models. The focus will be on simple simulation models of bee foraging behaviour that you will need to develop during the course of the project. You will build the software to run on a new PQ-Labs 32-point, 42" multi-touch surface (http://multi-touch-screen.com/) and versions for use on a conventional desktop computer with keyboard and mouse. A number of designs will be trialled in order to determine the most effective means of representing a multi-parameter space visually for interaction. The aim is to allow users to explore the space of emergent outcomes generated by the simulation and outline the parameter regions that give rise to the most interesting phenomena. Simple user-testing will be conducted by the student and supervisors. (The focus of this project is not on this aspect of interface design.) The purpose of the interface tool for this project is to map out regions in which various bee foraging strategies outperform competing strategies.

This is an untested idea that will require considerable creativity on the part of the student. A love of interactive computer graphics and graphic design is an advantage! Experience coding in C++ is a necessity. It would be of considerable benefit if students engaged on this project enrolled in the honours unit FIT4012 Advanced Topics in Computational Science.

Reading:

[1] Huston, M., D. DeAngelis, and W. Post, New Computer Models Unify Ecological Theory. BioScience, 1988. 38(10): p. 682-691.

[2] Dorin A., Korb K.B. & Grimm V., Artificial-Life Ecosystems: What are they and what could they become?, In Proceedings of the Eleventh International Conference on Artificial Life, S. Bullock, J. Noble, R. A. Watson, and M. A. Bedau (Eds.), MIT Press, Cambridge, MA. 2008, pp.173-180 [pdf paper]

[3] Grimm, V. and S. F. Railsback, Individual-based Modeling and Ecology, Princeton University Press, 2005

[4] Vries, H.de and J.C. Biesmeijer, Modelling collective foraging by means of individual behaviour rules in honey-bees. Behavioural Ecology and Sociobiology, 1998. 44: p. 109-124.


Simulation of Bee Foraging (18 or 24 pts)
Supervisors: Alan Dorin, Zoe Bukovac, Adrian Dyer (RMIT), Mani Shrestha

Bees forage for nectar and pollen from flowers to support their hives. In doing this, they pollinate our crops and support reproduction of plants in natural ecosystems. Globally, this resource is worth over $200 billion AUD to crop production every year. As our climate warms, it seems that pollinator and flower interactions may be changing. It is vital for us to understand how, so that we can manage the world's food supply and natural ecosystems. Together with an international team of ecologists, botanists and computer modellers, the participant in this project will contribute to an important global effort to understand the changing behaviour of bees under climate change. The student participant will write computer models to pick apart the factors that have the potential to contribute to the changing dynamics of insect/plant relationships. The technique to be applied is "agent-based modelling".

Agent-based models (ABMs) are simulations that operate from the “bottom up”. That is, they explicitly represent individual components (termed agents) that interact with one another and their environment to generate “emergent” phenomena at the group level. For instance, a population of individual birds (the agents) may be simulated and visualised to generate a flock and its associated behaviour (the emergent phenomenon). An agent’s behaviour at any moment during ABM simulation is dictated by its current state. State is often unique to each agent and may be influenced by that agent’s particular life history and perception of its local environmental conditions. Thus, ABMs maintain the basic principles that each individual in a population has unique behavioural and physiological qualities resulting from genetic and environmental influences; and that interactions between organisms are local – each individual is affected primarily by its local environment including by other organis
ms in its proximity [1].

What will be done:
In this project you will design novel agent-based models for understanding bee/flower interactions. The focus will be on simple simulation models of bee foraging behaviour that you will need to develop during the project. This project is of global significance and will require considerable creativity and dedication on the part of the student. The benefit is that the project involves conducting real science of massive potential benefit.

Requirements:
Experience coding in Java or C++ is a necessity. It would be of considerable benefit if students engaged on this project enrolled in the honours unit FIT4012 Advanced Topics in Computational Science and took FIT4008 reading unit with Alan Dorin in semester 2, 2013.

Reading:
[1] Huston, M., D. DeAngelis, and W. Post, New Computer Models Unify Ecological Theory. BioScience, 1988. 38(10): p. 682-691.

[2] Dorin A., Korb K.B. & Grimm V., Artificial-Life Ecosystems: What are they and what could they become?, In Proceedings of the Eleventh International Conference on Artificial Life, S. Bullock, J. Noble, R. A. Watson, and M. A. Bedau (Eds.), MIT Press, Cambridge, MA. 2008, pp.173-180 [pdf paper]

[3] Grimm, V. and S. F. Railsback, Individual-based Modeling and Ecology, Princeton University Press, 2005

[4] Vries, H.de and J.C. Biesmeijer, Modelling collective foraging by means of individual behaviour rules in honey-bees. Behavioural Ecology and Sociobiology, 1998. 44: p. 109-124

[5] Dyer, A.G., Dorin, A., Reinhardt, V., Rosa, M., "Colour reverse learning and animal personalities: the advantage of behavioural diversity assessed with agent-based simulation", Nature Precedings pre-print, http://hdl.handle.net/10101/npre.2012.7037.1 (March 2012)


Non-standard models of computation and universality (24 pts)
Supervisor: David Dowe

Zvonkin and Levin (1970) (and possibly earlier, Martin-Lo"f (1966)) consider the probability that a Universal Turing Machine (UTM), U, will halt given infinitely long random input (where each bit from the input string has a probability of 0.5 of being a 0 or a 1). Chaitin (1975) would later call this the halting probability, Omega, or Omega_U . Following an idea of C. S. Wallace's in private communication (Dowe 2008a, Dowe 2011a), Barmpalias & Dowe (to appear) consider the universality probability - namely, the probability that a UTM, U, will retain its universality. If some input x to U has a suffix y such that Uxy simlates a UTM, then U has not lost its universality after input x. Barmpalias, Levin (private communication) and Dowe (in a later simpler proof) have shown that the universality probability, P_U, satisfies 0 < P_U < 1 for all UTMs U and that the set of universality probabilities is dense in the interval (0, 1). We examine properties of the universality probability for non-standard models of computation (e.g., DNA computing).

Reference:
G. Barmpalias and D. L. Dowe, "Universality probability of a prefix-free machine", accepted, Philosophical Transactions of the Royal Society A


MML inference of systems of differential equations (24 pts)
Supervisor: David Dowe

Many simple and complicated systems in the real world can be described using systems of differential equations (Bernoulli, Navier-Stokes, etc). Despite the fact that we can accurately describe and solve those equations they often fail to produce accurate predictions. In this project, our goalis to create a way of inferring the system of (possibly probabilistic or stochastic (partial or ordinary) differential equations (with a quantified noise term accounting for any inexactness) that describes a real-world system based on a set of given data. Initially we can begin by working on a single equation with one unknown. (The noise could be due to a number of effects such as measurement inaccuracies or oversimplified models used.) From there, we can progressively move to gradually more complicated equations.

Minimum Message Length (MML) will be one of the tools used for modelling as it can provide ways of producing simpler models that actually fit closer than their more complicated counterparts produced by other methods. The project will become increasingly CPU-intensive but will ultimately have many real-world applications in a wide range of areas.

References:
Wallace (2005)

Dowe (2011a)


Econometric, statistical and financial time series modelling using MML (24 pts)
Supervisor: David Dowe, Farshid Vahid

Time series are sequences of values of one or more variables. They are much studied in finance, econometrics, statistics and various branches of science (e.g., meteorology, etc.).

Minimum Message Length (MML) inference (Wallace and Boulton, 1968) (Wallace and Freeman, 1987)(Wallace and Dowe, 1999a)(Wallace, posthumous, 2005)(Comley and Dowe, 2005) has previously been applied to autoregressive (AR) time series (Fitzgibbon et al., 2004), other time series (Schmidt et al., 2005) and (at least in preliminary manner) both AR and Moving Average (MA) time series (Sak et al., 2005).

In this project, we apply MML to the Autoregressive Conditional Heteroskedasticity (ARCH) model, in which the (standard deviations and) variances also vary with time. Depending upon progress, we can
move on to the GARCH (Generalised ARCH) model or Peiris's Generalised Autoregressive (GAR) models, or to inference of systems of differential equations.

This project will require strong mathematics - calculus (partial derivatives, second-order partial derivatives, integration by parts, determinants of matrices, etc.), etc.

References:
CoDo2005 Comley, Joshua W. and D.L. Dowe (2005).

FiDV2004 Fitzgibbon, L.J., D. L. Dowe and F. Vahid (2004).

SaDR2005
ScPL2005
Wall2005
WaBo1968

WaDo1999a Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4, pp270-283.

WaFr1987


Film production: how far can constraints go? (18 or 24 pts) Supervisors: Maria Garcia de la Banda, Chris Mears, Guido Tack, Mark Wallace

The "film production" (or "talent scheduling") problem, defined by [1] in 1993, is a very simplistic version of the real-life optimisation problem, which involves determining when and where scenes in a movie are filmed in order to minimise a certain objective function. While the simplified version only takes into account the cost incurred by actors who have to wait while in-between scenes, the real-life version needs to take into account many other factors, from location and light requirements, to limits in the amount of hours the crew can be working. There has been a significant amount of research on the idealised version of this problem using many different technique such as evolutionary algorithms, local search and constraint programming. However, it is not clear whether the results of this research apply to the more realistic version of the problem.

We have enlisted the help of an assistant director to many Australian movies to provide us with data and expertise regarding this problem. The aim of the project is to investigate how well the real-life problem can be modelled and solved using constraint programming.

This project is most suited for students with good mathematical, modelling and programming skills.

[1] Cheng, T. C. E., J. E. Diamond, B. M. T. Lin. 1993. Optimal scheduling in film production to minimize talent hold cost. Journal of Optimization Theory and Applications 79 479–482.

[2] Garcia de la Banda, M., Stuckey, P., Chu, G. Solving Talent Scheduling with Dynamic Programming. INFORMS Journal on Computing. 23(1): 120-137, 2011.


Exploring data management platforms for “big data” (24 pts)
Supervisor: Maria Indrawan-Santiago

The growth of data produced and consumed by applications has increased recently. Several organisations have seen the explosion of the amount of data that they have to collect, store and retrieve for its daily operations or decision making support. The explosion has placed challenges to the current relational DBMS such as Oracle, SQL Server and MySQL in term of query performance. To overcome the performance limitation of relational database in handling large amount of data, several alternative database models have been introduced in the last few years. This group of alternative database is known as “NOSQL” that could be interpreted as “Not Only SQL” or “No SQL”1. Examples of these new approaches are Big Table, Array and HDFS. The models have derived mainly from in-house research and development teams at major companies such as Google with the HDFS and Facebook with Haystack. Academic research contribution is very limited in this area.

There will be several possible projects in this area.

For example:

  • Finding the best performance database given different classes of queries.
  • Exploring the advancement in graph databases.

Final shape and scope of the project will be determined after discussion between supervisor(s) and individual student. What would you learn?

  • New database technology (both theoretical and practical)
  • Making a critical analysis of new technology

What types of skill do you need?

  • Critical thinking
  • Java programming
  • Relational database
  • Database modeling

Creative Evolution of Complexity (18 or 24 points)
Supervisors: Kevin Korb, Alan Dorin, Nic Geard

A key open problem for artificial life is how to build a simulation which exhibits the kind of creativity found in natural biology. Whereas the biosphere exhibits something like exponential explosions in biodiversity following major extinction events (e.g., the "Cambrian explosion"), evolutionary artificial life so far exhibits far more modest diversity growth.

This project aims to further develop an existing artificial life simulation of an ecosystem so that it grows in complexity exponentially. Niches are defined in terms of vectors of their products and environmental requirements in a recursive (open-ended) fashion. New niches are created by variations upon old and disappear when their environmental requirements are not met. The successful student will develop the simulation beyond the existing prototype and do either or both of the following: develop new ways of visualising the web of niches that are created; analyse in both old and new ways the complexity of the niche webs.

Students are requested to do some preliminary reading (e.g. at least (Levy 1997) and (Bedau et al 2000)) prior to commencing the project. Enrollment in the Honours unit FIT4012 is strongly encouraged as it will provide background material of direct relevance to the topic.

References:

  • Bedau, M.A., et al., Open Problems in Artificial Life. Artificial Life, 2000. 6(4): p. 363-376.
  • A. Dorin and K B Korb: Network measures of ecosystem complexity. Artificial Life XII. The MIT Press, pp. 323-328.
  • Levy, S., Artificial Life, the quest for a new creation. 1992, London: Penguin Books. 390

Investigating into Continuous Opinion Dynamics for Innovation Support System (24 pts)
Supervisor: Vincent Lee

Self-categorization theory (SCT) is a relatively new paradigm in social psychology that aims to explain the “psychological emergence of the group-level properties of social behaviour". A formal model of self-categorization, with the aid of adaptive intelligent tool, could be used to build a new opinion dynamics model, which would be social-psychologically founded. In an open innovation domain, individual behaviour if positively motivated can lead to generating creative ideas for process, product, market, and organisational innovations. As the scope of the SCT is an integrated group processes, SCT deals fundamentally with situations where a great number of individuals interact. These individual interactions are driving inputs to develop an agile innovation support system. Blogging and microblogging messages could typically generate complex collective phenomena, which however are difficult to anticipate the behaviour of individuals (with regard to interactions) for the support need for new idea generation. Simulation is a reliable way of exploring the collective dynamics resulting from the hypotheses made on the individual level.

The broad scope of this project will investigate the behaviour of a continuous opinion dynamics model, inspired by social psychology. The project will also study the behaviour of the model for several network interactions and show that, in particular, consensus, polarization or extremism are possible outcomes, even without explicit introduction of extremist agents. The expected outcomes are to compare the results of the simulation to what is expected according to the theory, and to other opinion dynamics models.

Keywords:

Opinion Dynamics; Self-Categorization Theory; Consensus; Polarization; Extremism; Open Innovation.

Relevant Readings

Rao, Balkrishna, C. (2010), On the methodology for quantitifying innovations, International Journal of Innovation Management, vol. 14, No. 5, pp.823-839.

Shin, Juneseuk, Park, Yongtae (2010), Evolutionary optimization of a technological knowledge network, Technovation, vol. 30, pp.612-626.

Laurent Salzarulo (2006). A Continuous Opinion Dynamics Model Based on the Principle of Meta-Contrast, Journal of Artificial Societies and Social Simulation vol. 9, no. 1

Amblard, F. and Deffuant, G. (2004), The role of network topology on extremism propagation with the relative agreement opinion dynamics, Physica A, vol. 343, pp. 725-738.

Salzarulo, L. (2004), Formalizing self-categorization theory to simulate the formation of social groups, presented at the 2nd European Social Simulation Association Conference, Valladolid, Spain, 16th-19th September 2004.


Intelligent Real-time Activities Recognition (24 pts)
Supervisors: Vincent Lee, Clifton Phua (I2R)

Real time activities recognition is an emerging research area, especially in smart future city living. This project focuses on the automated recognition of activities and behaviors in smart homes and providing assistance / intervention accordingly. We will carry out the automated monitoring of basic Activities of Daily Living (bADL) and instrumental Activities of Daily Living (iADL) among single and multiple residents in smart homes. Technically, these objectives translate to significant advances in sensitivity and specificity in activity and plan recognition of finer grained bADLs / iADLs for single subject; and improved location tracking, object / human dissociation and activity recognition among multiple subjects.

Readings:

Norbert Gyo? rbíró · Ákos Fábián · Gergely Hományi (2009), An Activity Recognition System For Mobile Phones, Mobile Network Applications, vol. 14, pp 82–91

DOI 10.1007/s11036-008-0112-y.

Flora Dilys Salim, Jane Burry, David Taniar, Vincent Cheng Lee, Andrew Burrow(2010), The Digital Emerging and Converging Bits of Urbanism Crowd designing a Live Knowledge Network for Sustainable Urban Living, in proceedings of 28th Conference on Future Cities eCAADe(Education and Research in Computer Aided Architectural Design in Europe 2010


User Interface Elements for iOS-based OLAP Tools (24 pts)
Supervisor: Rob Meredith

The adoption of iPhone and iPad devices by executives has send a number of Business Intelligence vendors release iOS versions of their OLAP tools. There has been little research investigating the use of iOS interface gestures for typical OLAP functions that allow navigation of multi-dimensional data structures.

This project will implement simple OLAP tools on an iOS device to conduct experiments to test the efficacy of various OLAP navigation techniques.

The project will be suitable for a bachelor or masters honours student with a background in programming and an interest in data visualization, decision support, data analytics or business intelligence. Students should have access to either an iPhone or iPad as well as an Apple computer for development work. Students will not need to pay for a developer license, as this will be covered by Monash.


Social Media use by Managers (24 pts)
Supervisor: Rob Meredith

The use of social media, or Web 2.0, technologies in Business Intelligence tools is growing in popularity. However, many organizations restrict or discourage the use of social media sites by employees, and it is unclear how popular the take-up of internal social media services is.

This project will survey managers in a variety of organizations to ascertain their use of social media sites for personal use, the level of adoption of enterprise 2.0 technologies in their organization, and their perceptions of the potential usefulness of social media in supporting their work.

This project will suit a bachelor honours or masters honours student with an information systems orientation. Students should have some appreciation for managerial work and Web 2.0 technologies.


Generation-Y: What do they think about Mobile Payment? (24 pts) Supervisor: Mahbubur Rahim

Payment method has undergone a drastic change in line with technology and science development. Mobile payments are payment for goods, services, and bills/invoices with a mobile device like mobile phones and personal digital assistant, leveraging on wireless and other communication technologies. However, the acceptance and usage of mobile payment is relatively low albeit high penetration of mobile phone services in Australia. This project would investigate the Generation-Y’s perception and intention towards mobile payment using a modified Technology Acceptance Model (TAM). It will involve a survey among students for data collection.


Advanced Visualisation for Constraint Propagation and Search (24 pts) Supervisors: Guido Tack, Chris Mears

The project is suitable for both Honours and Minor Theses.

Optimisation problems arise almost everywhere. Advances in optimisation technology have made it possible to solve problems in diverse areas such as transport networks, production scheduling, energy grids, nurse rostering, university timetables, or protein structure prediction. One challenge with these approaches is that in order to improve their efficiency, it is vital to understand their behaviour at different levels. Current tools offer little to no support for this.

In this project, you will develop visualisation strategies for a particular class of optimisation solvers, those based on constraint propagation and search. A number of tools have been proposed over the years (see e.g. [1], [2], [3]), but they usually do not scale well to the massive search trees encountered in real-life applications. Your goal will therefore be to explore what kind of data can be extracted from these search trees, and to develop techniques for aggregating and visualising that data. The visualisation will be based on extensions of the tree map technique [4].

The project is suited to students with good mathematical and programming skills. Some experience with programming graphical user interfaces will be helpful.

Helmut Simonis, Paul Davern, Jacob Feldman, Deepak Mehta, Luis Quesada, Mats Carlsson: A Generic Visualization Platform for CP. In David Cohen (ed): CP 2010. LNCS 6308, Springer, 2010.

Christian Schulte: Oz Explorer: A Visual Constraint Programming Tool. In Lee Naish (ed): ICLP 1997. MIT Press, 1997.

Pierre Deransart, Manuel V. Hermenegildo, Jan Maluszynski (Eds.): Analysis and Visualization Tools for Constraint Programming, Constrain Debugging (DiSCiPl project). LNCS 1870, Springer, 2000.

http://www.cs.umd.edu/hcil/treemap-history/


Effective One-Pass Lossless Compression of Greyscale Images (18 or 24 pts)
Supervisor: Peter Tischer

Background Information:
A lossless compression of a grey-scale image allows the image to be stored in a minimal number of bits and for an image to be reconstructed that is bit-for-bit identical with the original image. A number of years ago a postgraduate student in this school developed, glicbawls, the world's second best lossless compression program for lossless compression of grey-scale images. (The same person had also developed the world's best lossless image compressor, TMW). glicbawls was designed for the International Obfuscated C programming competition. The fundamental approach in glicbawls is one-pass but the source code of the program itself is impossible to understand and very much tuned to the coding of natural images.

Project Aim and Basic Outline of Approach
The aim of the project is to produce a version of glicbawls which is easy to understand and modify. A non-obfuscated version of glicbawls might exist but we cannot rely on getting access to it. glicbawls has been documented in sufficient detail for it to be recreated from first principles. Recreating glicbawls should be accomplished by the start of the second semester of the project. The project can then explore how glicbawls can be tuned to perform well on particular classes of image, particularly medical images. Extension to 3D greyscale or to colour images should also be considered.

URLs and bibliographic references to background reading: See Peter TISCHER

Pre- and co-requisite knowledge and units studied: No prior knowledge of image processing or data compression is required for this project.


Blending Probability Distributions for Mixtures of Experts (18 or 24 pts)
Supervisor: Peter Tischer

For nearly 15 years TMW (Tischer-Meyer-Wombat) has been probably the world's best lossless compression technique for grey-scale images. The 'Meyer' in TMW is for Bernd Uwe Meyer, a.k.a. Bernie, a former postgraduate student in the School of Computer Science and Software Engineering and a different person from the current Associate Dean of Education in the Faculty of Information Technology, Dr. Bernd Meyer.

Nearly all lossless image coding schemes form a predicted value for the pixel value to be encoded next and encode the difference between the predicted pixel value and the actual pixel value. A distinguishing feature of TMW is that it actually forms a number of probability distributions of predicted pixel values and then takes a linear combination of these probability distributions to come up with the final probability distribution to be used in encoding the actual pixel value.

The Bernie Meyer approach could be termed a 'mixture of experts' approach as it combines the contributions from a number of different sources. This approach need not be restricted to lossless image compressions and has potentially very great significance to the general problem of Machine Learning.

Bernie Meyer came up with a model probability distribution which is easy to compute and where it is possible to form a linear combination of different instances of these probability distributions in a straightforward and efficient manner.

Project Aim and basic outline of research:
The aim of this project is to investigate a different way of representing a probability distribution for which it is still straightforward to produce a linear combination of probability distributions. This way retains the computational advantages of Bernie's model probability distribution but may have the advantage of having more degrees of freedom to allow a wider range of probability distributions to be modelled accurately.

Dr. Torsten Seemann developed an approach for lossless image compression where the results of a number of sub-predictors are blended to produce a final predicted value. Instead of blending predicted values produced by sub-predictors, as in Dr. Seemann's approach, a student could explore blending probability distributions associated with sub-predictors.

URLs and bibliographic references to background reading: See Peter TISCHER as well as:
T Seemann and PE Tischer: Generalized locally adaptive DPCM,
97/301, Department of Computer Science Technical Reports,
Department of Computer Science, Monash University, Melbourne, Australia

No prior knowledge of image processing or data compression is required for this project.


Measuring the Complexity of Networks and Graphs (18 or 24 pts)
Supervisor: Peter Tischer

Background Information:
In learning models to explain data we need to be able to trade-off the complexity of our model against the model's ability to describe the data. The most effective way of doing this is by using a technique called MML - Minimal Message Length Inductive Inference. MML was developed by the foundation professor of Computer Science at Monash University, Professor Chris Wallace

In order to apply MML we need to be able to say what the smallest number of bits is that we need to represent the model. Finding the smallest number of bits to represent a given amount of data is known as lossless data compression.

In the case of trying to explain data by using network models we need to be able to describe a network, or a graph description of the network, by using the smallest possible number of bits.

Project Aim and Basic Outline of Approach:
This project will investigate a number of ways of incorporating our prior knowledge about the properties of networks in such a way as to come up with the most effective probability models for describing graphs which represent networks. While developing techniques that can be applied to any kind of network, the project might pay particular attention to networks which arise in connection with social networks, such as the spread of ideas, or diseases, in a population, or with gene regulatory networks that arise in bioinformatics.

URLs and bibliographic references to background reading: See Peter TISCHER

Pre- and co-requisite knowledge and units studied: This project will not require any special knowledge in estimating probabilities, lossless data compression or of graph theory.


Measuring the complexity of a segmentation of a picture (18 or 24 pts)
Supervisor: Peter Tischer

Background Information:
A fundamental problem in image processing is recognising which pixels in an image belong together because they represent the same object. This is known as a segmentation. With image segmentation it is hard to find the best trade-off between having segmentations with large numbers of segments and small numbers of pixels in each segment and segmentations with relatively small numbers of large segments.

In order to apply state-of-the-art techniques like MML to resolve the correct level of complexity in a segmentation, it is necessary to know the minimum number of bits needed to describe the segmentation. This is equivalent to knowing how probable the segmentation is, given our prior belief about how likely any particular segmentation is.

Any segmentation of an image can be represented by a segment map. In a segment map, the entry at a particular row and column is the segment number for the pixel in the original image at the corresponding row and column location. Segment maps look like two-dimensional digital images and can be efficiently stored by using techniques for lossless image compression. Some of the world's best techniques for lossless compression of images have been developed by Computer Science researchers at Monash university.

Project Aim:
It is the aim of this project to apply and customise techniques for the lossless compression of grey-scale images to segment maps and to compare the performance of the new techniques against the performance of some existing techniques for storing segment maps.

URLs and bibliographic references to background reading: See Peter TISCHER

Pre- and co-requisite knowledge and units studied: This project will not require any special knowledge in estimating probabilities, lossless data compression or image processing.


Information Visualisation (18 or 24 pts)
Supervisor: Peter Tischer

Background Information:
Information Technology allows us to process vast amounts of digital data. However, this processing might also result in a vast amount of digital data. A human being is best able to process large amounts of information by looking at pictures. Information Visualisation is about taking large amounts of information and turning that into a picture or series of pictures.

In general, there will be so much information that it will exceed the amount that can be represented in a picture and so information much be lost in creating a visualisation of that data. For instance, each item in our data might be represented by a vector of 'h' numbers and the visualisation may involve mapping each item to a vector of 'l' numbers where 'l' is no bigger than 'h'.

With the proviso that 'l' is strictly less than 'h', in Mathematics, a function which does this mapping is called a projection. Projections can be regarded a way of simultaneously mapping all points in the higher dimensional space to points in a lower dimensional space. The most widely-used approach in information visualisation is to map the original data to points in a lower-dimensional visualisation space by using a projection. Then the output image is drawn as a cloud of points.

Project Aim and basic outline of approach:
Whereas Mathematicians might think about using projections to map points in the original space to points in a visualisation space, human beings understand pictures in terms of two-dimensional regions whose boundaries are defined by curves. The aim of this project is to take a 'Computer Science' approach to visualisation. This can involve, among other things, the idea that instead of producing a single output image which consists of a cloud of points, an information visualisation can consist of a number of two-dimensional regions, separated by curves at their mutual boundaries, and where the information in the output image can be revealed progressively as each point is processed, or the user can determine what information in the output image is revealed or suppressed.

URLs and bibliographic references: See Peter TISCHER

Pre- and co-requisite knowledge and units studied: This project will not require any special knowledge in computer graphics or mathematics. Students who have not taken Computer Graphics might either want to take it in first semester or to sit through its lectures in first semester. A fondness for creating pretty pictures will be an advantage but this project has the potential to explore a number of deeper issues.


Image Denoising for High Resolution Digital Photography (18 or 24 pts)
Supervisor: Peter Tischer

Background Information:
While a PhD student in the School of Computer Science and Software Engineering at Monash university, Dr. Torsten Seemann developed the local segmentation approach for the removal of noise from digital images.

In this approach, each time a pixel is processed, the pixels in a window which includes the pixel to be processed, are examined and a decision is made whether the window covers one segment or two segments. In subsequent processing only those pixels which are deemed to be in the same segment as the pixel to be processed are used in the processing of that pixel.

Thus, if we have a black square on a white background, the intention is to combine only pixels which are either white or black and to avoid producing pixels which are a mixture of black and white.

Project aim and basic outline of research:
The aim is to characterise the various sources of noise in digital images and to taylor the basic local segmentation approach to image denoising to take into account the special characteristics of the noise sources. Of particular interest will be removing granular noise in images which have been scanned from film.

Another aim will be quick and computationally inexpensive ways of deciding how many segments are covered by a window and which pixels belong to a particular segment.

URLs and bibliographic references to background reading: See Peter TISCHER
Dr. Seemann's PhD thesis is available at www.csse.monash.edu/~torsten

Pre- and co-requisite knowledge and units studied:
No prior knowledge of image processing or data compression is required for this project. Many image denoising techniques in the image processing literature involve the use of advanced signal processing techniques from 1-D signal processing or advanced mathematics like partial differential equations. The techniques used in this project will not use any complex or advanced mathematics whatsoever.


Parallel Hierarchical Image Segmentation (18 or 24 pts)
Supervisor: Peter Tischer

Background Information:
The key operation in computer manipulation of digital images is determining which pixels can be grouped together because they represent the same object. This is called segmentation. It is something the human visual system does very well in real time and which computer programs do not do well at all.

In the real world objects can be part of larger objects or can themselves contain smaller objects. This leads to hierarchical segmentations.

A major change in computer systems has been the move away from every faster single processor systems to multi-core processors and multiprocessor systems. At the moment dual core and quad core processors are becoming widespread. However, GPUs, (Graphical Processor Units), are examples of architectures where much greater levels of concurrent execution of programs is possible.

Project aim and basic outline of research:
The primary aim is to explore a simple and quick way of segmenting images. This algorithm is similar in complexity to common algorithms for computing minimal cost spanning trees of large graphs. The complexity of this algorithm is o(n log n) but here 'n' represents the number of pixels in a digital image and can be in the order of millions.

The algorithm will be able to handle large images where those images might be colour or multi-spectral. In addition, the algorithm will be able to segment images based on texture as well as raw pixel values.

URLs and bibliographic references to background reading: See Peter TISCHER

Pre- and co-requisite knowledge and units studied: No prior knowledge of image processing, data compression or parallel programming is required for this project.


A Correlation for the twenty first century (18 or 24 pts)
Supervisor: Peter Tischer

Background Information:
The first thing people think of when asking whether two variables are related is to compute the correlation coefficient of the two variables. The correlation coefficient is a measure of the extent to which you can approximate the values of one variable by a straight line function of the values of the other variable.

The correlation coefficient is thus a way for measuring linear statistical dependency of the values of two variables. However, the values of two variables might be related in a nonlinear way. Take, for example, the case of the x and y co-ordinates of points on a circle. Once we know one co-ordinate, there are only two possible values the other co-ordinate can take.

Over the years there have been many attempts to develop a way of measuring statistical dependency, and not just linear statistical dependency. One promising approach is detailed in a recently published paper in Science. Most of these approaches involve estimating the amount of mutual information, the amount of information which is common to both variables.

Project aim and basic outline of research:
The project involves surveying a number of approaches for computing measures of statistical dependency between variables. At least one of these will be the approach described in the Science paper. Some of those approaches will be implemented. By using ideas from data compression and MML, these approaches will be critiqued and a possibly better technique might be developed.

URLs and bibliographic references to background reading:
"Detecting Novel Associations in Large Data Sets", Science, 334, 2011, pp 1518-1524
See also Peter TISCHER

Pre- and co-requisite knowledge and units studied: There is no need for special knowledge of statistics or ways of estimating probabilities. Probabilities will be estimated from the data. The project will not involve deriving theoretical results.


Learning from Very Large Data (24 pts)
Supervisor: Geoff Webb

Background
Machine learning is a fundamental technology that underlies the core features of many modern businesses such as social networking, internet search and online commerce. Demand for graduates with advanced machine learning skills is extremely high. Many advanced applications of machine learning have access to extraordinarily large quantities of data. However, there is a paradox that most modern machine learning techniques with the theoretical capabilities to produce the most accurate classifiers from large data (those with very low asymptotic error) have computational complexity that makes them infeasible to apply to large data (are super-linear on the data quantity).

Project aim and basic outline of approach
This project will explore approaches to developing feasible classifiers with low asymptotic error that have linear or sub-linear computational complexity. As a starting point we will look at modifications to the Averaged N-Dependence Estimators algorithm that reduce its complexity with respect to its parameter N.

URLs and bibliographic ref's to background reading
http://www.csse.monash.edu.au/~webb/cgi-bin/publications.cgi? keywords=Conditional%20Probability%20Estimation
https://www.cse.ust.hk/~jamesk/papers/jmlr05.pdf
http://www.springerlink.com/content/mt27q3g813631k79/
http://www.springerlink.com/content/lgmknt1jx7crkaeu/

Pre- and co-requisite knowledge and units studied as appropriate
Either advanced Java programming skills or a strong mathematical background is required.


Non-parametric Bayes versus Model Selection for Topic Modelling
Supervisors: Dr Mark Carman and A./Prof. David Dowe

Background

Topic Models are a form of generative (ad)mixture models that are frequently used to identify topics within a body (or corpus) of documents and to perform a "soft clustering" of documents into topics. When the number of topics is not known in advance (a priori), this should also be estimated from the corpus. A model selection technique such as Minimum Message Length (MML) or related approaches could be applied to trade-off model complexity versus goodness of fit to the data. Current literature on Topic Modelling uses a variety of theoretically mathematically complicated (non-parametric Bayesian) techniques (e.g., Dirichlet and Pitman-Yor Processes) to infer the (supposedly) "optimal" number of clusters. Such techniques have become popular as they are relatively simple to implement (as part of a numerical Gibbs sampling routine). The Minimum Message Length (MML) principle from Bayesian information theory (Wallace (2005), Dowe (2011a)) enables us (given sufficient data) to infer any computable or expressible model from data (e.g., Wallace & Dowe (1999a) and chapter 2 of Wallace (2005). It has previously been applied to clustering and mixture modelling, but not to this specific problem.

Aims and outline

This project proposes to compare empirically the performance of the two approaches (model selection and non-parametric Bayes) to admixture modelling. From a theoretical perspective we will attempt to better understand under what conditions the two methods for choosing the number of parameters of the model are indeed equivalent and what are the relative computational requirements (expected convergence rate) of the two approaches.

URLs and references

Wallace (2005) C. S. Wallace (2005), "Statistical and Inductive Inference by Minimum Message Length".

Pre- and Co-requisite knowledge

The work will become heavily mathematical, and no less so when we use Minimum Message Length (MML).


Segmenting Irregularly Spaced Data
Supervisors: Peter Tischer and David Albrecht

Background

Clustering is a fundamental activity in data analysis where we are able to divide the data into groups which share some common property. In the special case where the group members are spatially connected, a clustering is called a segmentation. For example if you were to cluster the colours of pixels in an image of a face, the colours of pixels representing the pupils of the eyes would appear in the same cluster. However, if we were to segment the pixels in the image, the pixels for the pupil of the left eye should be in a different segment from the pixel for the pupil in the right eye because normally the pupils are not spatially connected.
In order to determine whether two things, A and B, are spatially connected, we need to be able to find a path conencting A and B where each thing in the path is a neighbour to the thing immediately before it and the thing immediately after it. With regularly space data like images and video, it is easy to determine whether two things are neighbours. With irregularly spaced data determining neighbours is not as straightforward.
A popular way of determining neighbours in irregularly spaced data is to use Voronoi diagrams. In a Voronoi diagram all the cells are disjoint and distinct points belong to different cells. We say two points A and B are 'natural' neighbours if their cells in a Voronoi diagram are neighbours.
Once we have established which points are neighbours, a quick way to segment data is to use an approach which is called the single-link, nearest-neighbour hierarchical clustering. This is equivalent to computing a minimal spanning tree of the data.

Aims and outline

The aim of the project is to produce a program which is given irregularly spaced data in a d-dimensional space, computes for each data what its natural neighbours are, and then segments the data. Work done in the Ph.D. thesis of a Monash graduate, Ben Goodrich, suggests that if we know how our data can be segmented, we can come up with an effective Machine Learning algorithm which uses SVMs (Support Vector Machines).

A modified version of this program can also be used to carry out supervised classification. Given a set of d-dimensional points which have been labelled, compute the natural neighbours of a given query point and come up with a label for the query point by looking at the labels of the natural neighbours. For example, if the labels are either CANCER and NON-CANCER and the points represent people, then for a new person determine whether they have CANCER or NON-CANCER based on the labels of their natural neighbours.

URLs and references

See the Wikipedia entry on Natural Neighbors and Natural Neighbor Interpolation.

Please come and talk to Peter TISCHER and David ALBRECHT.


ANZ Project: MML time series and Bayesian nets with discrete and continuous attributes
Supervisors: A./Prof. David Dowe

Background

The first application of MML to Bayesian nets including both discrete and continuous-valued attributes was in Comley & Dowe (2003), refined in Comley & Dowe (2005)[whose final camera-ready version was submitted in Oct 2003], based on an idea in Dowe & Wallace (1998) . The Minimum Message Length (MML) principle from Bayesian information theory (Wallace (2005), Dowe (2011a) ) enables us (given sufficient data) to infer any computable or expressible model from data (e.g., Wallace & Dowe (1999a) and chapter 2 of Wallace (2005) ). One of the particular specialties of MML is when the amount of data per parameter is sparse, such as the Neyman-Scott  (1948)  problem. In such cases, we see the classical approach of Maximum Likelihood and many approaches converge to the wrong answer (even for arbitrarily much data), but Dowe & Wallace (1997) and chap.  4  of Wallace (2005) and sec. 6 of Dowe (2011a)all show MML doing well.

Aims and outline

We seek to enhance this original work to Bayesian nets which can change with time, using the mathematics of MML time series in Fitzgibbon, Dowe & Vahid (2004) . The student will be required to: Understanding relevant underlying mathematics, Developing necessary mathematics, Developing software for relevant mathematics, Testing and applying software on real-world data.

URLs and references

Comley & Dowe (2003) Comley & Dowe (2005) Dowe (2008a) Dowe (2011a) Dowe & Wallace (1998) Fitzgibbon, Dowe & Vahid (2004) Wallace (2005) Wallace & Dowe (1999a)

Pre- and Co-requisite knowledge

The ability to program is essential. The work will use Minimum Message Length (MML) and will become quite mathematical.


MML inference of SVMs, DEs, time series, etc.
Supervisors: A./Prof. David Dowe

Background

The Minimum Message Length (MML) principle from Bayesian information theory (Wallace (2005), Dowe (2011a) ) enables us (given sufficient data) to infer any computable or expressible model from data (e.g., Wallace & Dowe (1999a) and chapter 2 of Wallace (2005) ). When the amount of data per parameter is scarce, such as in the Neyman-Scott (1948), we see the classical approach of Maximum Likelihood and many approaches converge to the wrong answer (even for arbitrarily  much  data),  but Dowe & Wallace (1997) , chap. 4 of Wallace (2005)and sec. 6 of Dowe (2011a)all  show  MML  doing well. The information-theoretic log-loss (logarithm   of   probability)   scoring   system   is   unique   in   being invariant to the parameterisation of questions (Dowe, 2008a,  footnote  175; Dowe, 2011a , sec. 3). This gives even further justification to the MML approach. Given this generality, indeed this universality, MML can be reliably applied to any inference problem.  We  list here three of several (or infinitely many?) possible examples   -   namely, inference   of   support   vector machines   (SVMs), inference of differential   equations   (DEs)   and/or   inference of econometric time series, among many  other  examples. Examples   of many   applications   include   modelling   of   dynamical   systems   and   modelling financial markets.

Aims and outline

We focus on one specific project - be it inference of support vector machines (SVMs), inference of differential equations (DEs), inference of econometric time series and/or whatever. We then use Minimum Message Length (MML) to infer the model from the data. We compare with alternative methods on both artificially generated data and real-world data.

The student will be required to: Understand relevant underlying mathematics, Develop necessary mathematics, Develop software for relevant mathematics, Test and applying software on real-world data.

URLs and references

Comley & Dowe (2003) Comley & Dowe (2005) Dowe (2007) Dowe (2008a) Dowe (2011a) Dowe, Gardner & Oppy (2007) Dowe & Wallace (1997) Fitzgibbon, Dowe & Vahid (2004) Tan & Dowe (2004) Wallace (2005) Wallace & Boulton (1968) Wallace & Dowe (1999a)

Pre- and Co-requisite knowledge

The ability to program is essential. The work will use Minimum Message Length (MML) and will become quite mathematical no matter what direction the project takes.


(Google) Maps Databases
Supervisors: Assoc. Prof. David Taniar

Background

Are you interested in developing algorithms to solve queries, such as: "Given a map containing some places of interest, find three closest places of interest from a given location"; or "Given a map containing 100 objects of interest, draw a graph which represents that represent the nearest neighbour of each object".

Aims and outline

This project aims to develop efficient algorithms to process spatial database queries, by incorporating some of the properties of computational geometry.

URLs and references

http://en.wikipedia.org/wiki/Nearest_neighbor_graph

Pre- and co-requisite knowledge

Have a strong interest in math, including geometry; and Have a passion in solving puzzles (e.g. what does the picture shown in http://en.wikipedia.org/wiki/Nearest_neighbor_graph mean?)


Security Analysis of NTLM Authentication Protocol
Supervisors: Ron Steinfeld

Background

NTLM is a widely used authentication protocol designed to secure remote login in several Microsoft network applications. Despite its popular use, its security properties are not well understood. The protocol has some known security weaknesses. Identifying further weaknesses is critical to undertanding the risks associated with its use.

Aims and outline

The aim of this project is to investigate and improve current understanding of the NTLM protocol's security and its vulnerabilities. In particular, the project will explore the feasibility of adapting known attacks on other similar protocols, such as WEP, to break one or more security goals of NTLM under suitable conditions. Part of the project will involve implementing and testing known and new attacks using the open-source implementation of NTLM. Other topics in cryptography-related areas are available for interested students.

URLs and references

[1] The NTLM Authentication Protocol and Security Support Provider -- http://davenport.sourceforge.net/ntlm.html (This page describes the NTLM protocol).
[2] "Intercepting Mobile Communications: The Insecurity of 802.11" -- http://www.isaac.cs.berkeley.edu/isaac/wep-draft.pdf (This paper describes attacks on the WEP protocol).
[3] "The Java CIFS Client Library" -- http://jcifs.samba.org/ (This site contains an open-source implementation of the NTLM protocol).

[4] "Understanding the Windows SMB NTLM Authentication Weak Nonce Vulnerability" -- http://www.ampliasecurity.com/research/NTLMWeakNonce-bh2010-usa-ampliasecurity.pdf(This presentation explains some known vulnerabilities of NTLM).

Pre- and co-requisite knowledge

Familiarity with the basics of cryptography would be an advantage. The student should have good mathematical and programming skills.


Fluid visual interfaces for exploring graph databases
Supervisors: Tim Dwyer, Michael Wybrow

Background

Graph Databases are a type of "NoSQL" database that are quickly gaining popularity. For certain types of data, modelling that data with a graph (or network) is a much better fit than the traditional tabular relational database model. Various textual languages exist to query graph databases (e.g. SPARQL, Gremlin) however a much more natural way to allow people to explore this type of data is through direct manipulation of interactive visuals. Yet, current techniques for doing this are rudimentary and not very friendly.

Aims and outline

This project is about creating fast, fluid ways for people to interact with graph data using direct interaction exploiting HTML5 and multitouch.


Finding useful associations in data
Supervisors: Geoff Webb

Background

Association discovery is an important area of data mining that identifies associations between factors in data.

Many statistically significant associations are of little interest in practice because they are trivially implied by other associations. For example, if it is known that having prostate cancer implies being male and that prostate cancer is associated with abdominal pain then the fact that being male and having prostate cancer is associated with abdominal pain is not informative. Non-redundant associations repress a limited class of such associations (including the example above). Non-derivable associations repress a much larger class of such associations. It has been argued that the inference rules on which non-derivable associations are based are stronger than those naturally understood by an average user.

Aims and outline

This project will investigate whether this is true and develop and explore intermediate constraints between non-redundant and non-derivable associations to assess their relative value for discarding associations that will not be useful to normal users.

URLs and references

http://www.csse.monash.edu.au/~webb/Files/Webb11.pdf
http://www.csse.monash.edu.au/~webb/redirects/Webb10.html

http://arxiv.org/pdf/cs.DB/0206004

Pre- and co-requisite knowledge

Students will require strong programming skills, preferably in C++ or Java


Predicting defects in software components
Supervisors: Yuan-Fang Li, Reza Haffari

Background:
Software systems are becoming increasingly large and complex. Software quality assurance (QA) usually faces resource constraints in budget, personnel and time. Hence, the efficient allocation of QA resources is of high importance to maintain high quality standards. As a result, the ability of predicting defectiveness of software components (modules, classes, files, methods, etc.) through software metrics is an area of great practical value.

Aim and Outline
The aim of this project is to develop novel software defect prediction (SDP) frameworks that make use of advanced machine learning techniques. Specifically, we will be exploring the following problems: (1) novel process-oriented metrics, (2) new learning algorithms that is able to give evidence for prediction outcomes, and (3) new ranking algorithms that is able to rank components by predicted defect density/severity, etc.

URLs and References
[1] Lessmann, Stefan, et al. "Benchmarking classification models for software defect prediction: A proposed framework and novel findings." Software Engineering, IEEE Transactions on 34.4 (2008): 485-496. (http://ieeexplore.ieee.org/document/4527256/)
[2] Menzies, Tim, Jeremy Greenwald, and Art Frank. "Data mining static code attributes to learn defect predictors." Software Engineering, IEEE Transactions on 33.1 (2007): 2-13. (http://ieeexplore.ieee.org/document/4027145/)
[3] Menzies, Tim, et al. "Defect prediction from static code features: current results, limitations, new approaches." Automated Software Engineering 17.4 (2010): 375-407. (http://link.springer.com/article/10.1007/s10515-010-0069-5)
[4] Lewis, Chris, et al. "Does bug prediction support human developers? findings from a google case study." Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 2013. (http://dl.acm.org/citation.cfm?id=2486838)
[5] Moser, Raimund, Witold Pedrycz, and Giancarlo Succi. "A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction." Software Engineering, 2008. ICSE'08. ACM/IEEE 30th International Conference on. IEEE, 2008. (http://ieeexplore.ieee.org/document/4814129/)

Pre- and Co-requisite Knowledge
Strong programming skills
Basic knowledge of software engineering, machine learning or data mining
One of: FIT3080/FIT4004/FIT4009/FIT5171/FIT5047/FIT5142


Environmental Sensing with Swarms of Flying Robots Supervisors: Jan Carlo Barca and Karan Pedramrazi

Background
Swarms of robots that are capable of carrying out environmental sensing tasks offer an edge over traditional static sensor networks as the sensor carriers can move about autonomously in order to fulfil the most recent task requirements.

Aim and Outline
This project aims to devise mechanisms that enable swarms of quad copters to harvest environmental data and synthesise the captured information into three-dimensional maps. The student will work within Monash Swarm Robotics Laboratory and attend weekly meetings with researchers in the lab. This is a great opportunity for the selected student to learn about swarm robotics and work within a multi disciplinary team consisting of software, mechanical and electrical engineers.

URLs and References
Brambilla, M., Ferrante, E., Birattari, M. and Dorigo, M. (2012) "Swarm robotics: A review from the swarm engineering perspective", Swarm Intelligence, vol. 7, issue 1, pp 1-41. Available: http://iridia.ulb.ac.be/IridiaTrSeries/rev/IridiaTr2012-014r002.pdf

Kumar, V. and Michael, N. (2012) "Opportunities and challenges with autonomous micro aerial vehicles, International Journal of Robotics Research", vol. 31, issue 11, pp. 1279-1291. Available: http://www.isrr-2011.org/ISRR-2011/Program_files/Papers/kumar-ISRR-2011.pdf

Pre- and Co-requisite Knowledge
Advanced C++ programming experience and a strong desire to work with sensor systems and flying robots is essential.



Statistical Topic Models for Text Segmentation
Supervisors: Reza Haffari and Ingrid Zukerman

Background
Text segmentation is the problem of taking a contiguous piece of text, for example closed-caption text from news video, and dividing it up into coherent sections. The segmentation problem has been researched in Natural Language Processing mainly in the context of discourse segmentation. Various segmentation models have been proposed based on (i) lexical cohesion to capture topical aspects of utterances, (ii) entity chains to capture interaction between the form of linguistic expression and local discourse coherence, and (iii) cue phrases.

Aim and Outline

Statistical topic models are the current state-of- the-art for text segmentation. In this project we augment topic models with additional sources of information, e.g., those coming from a domain expert, to enhance text segmentation.

Pre- and Co-requisite Knowledge
FIT3080 Intelligent systems or equivalent is a mandatory prerequisite, and solid knowledge of probability is desirable.


Two Approaches to Optimising Chemical Engineering Design
Supervisors: Maria Garcia de la Banda, Mark Wallace

Background
Motivating Simulation
Typically the design of a complex system is formulated as a set of parameter settings. In such systems it is hard to predict how the parameter settings impact the performance of the system (for example how the length, width and curvature of an aeroplane wing affects its flying performance). Consequently the specified system must either be built and tested or, more usually, simulated. Simulation, using well-established tools such as Aspen, is used for evaluating the design of chemical processing plants.

Addressing the Computational Cost of Simulation
Given just 20 parameters with 5 alternative settings each, the number of alternative resulting designs is 5^20 which is around 10^14. If a single simulation takes a minute of computer resource, it would take several hundred million years to evaluate all the alternatives. Instead the simulation-optimisation community use heuristic techniques such as simulated annealing or genetic algorithms. These techniques seek a high quality solution by modifying previous solutions and using simulation to determine whether the new solution is better or worse.

Multiple-Objective Optimisation (MOO)
If there are multiple objective criteria – such as cost, throughput, and CO2 emissions - it no longer suffices to find a single good solution. Instead the evaluation procedure needs to explore many solutions to reveal the trade-offs between the different criteria. The number of solutions necessary to reveal the “efficient frontier”, where no criteria can be improved without degrading another one, may be in the hundreds or thousands. Even using the established techniques of simulation-optimisation, the multiple-objective optimisation problem (“MOO”!) is computationally prohibitive.

The computational Cost of MOO
At Monash chemical engineering researchers are investigating tractable approaches for solving the MOO problem. The idea (of the “Nondominated Sorting Genetic Algorithm – NSGA-II”) is to find additional solutions along the efficient-frontier by modifying and combining previous solutions on the frontier. As with any heuristic method, there is no guarantee of the quality of the results. For simulation-optimisation, search can be focussed around the current best solution, but for solving the MOO problem, search must be “spread” along the whole efficient frontier. Consequently the chances of the heuristic algorithm failing to find solutions on the efficient frontier are very high. Moreover, with each new criterion added to the set of objectives, the size of the frontier grows dramatically. Consequently there is an urgent need to find novel scalable approaches for the MOO problem.

Aim and Outline
Whatever approach is adopted for multi-objective optimisation in a Chemical plant, it is inevitable that a large number of parameter setting combinations must be evaluated. Consequently it is essential that the computationally-expensive simulation step must be taken out of the evaluation loop. There are two possible ways of achieving this:

  1. Plants can be modelled using non-linear equations and inequations. Interval reasoning is an option for optimising models of this kind – assuming the number of parameters is small enough.
  2. Plants can be broken down into a network of processing units. If the behaviour of each processing unit could be pre-computed (which could be done by simulation, for example, if only a few parameters were applicable to an individual unit), then the behaviour of the plant could be embedded in an optimisation  process  without  using  any  further  simulation.

    Honours Project
    Investigate a gas turbine combined cycle power station to understand the number of processing units, the parameters applicable to each unit, the model associated with each unit, and the model associated with the plant (simplified as necessary to meet the project timescale).  Investigate  approach  2,  above,  and  compare  the  results  achievable  with  this  approach  with those achieved by previous researchers using the NSGA-II algorithm on the same problem.

URLs and References
http://en.wikipedia.org/wiki/Process_design
http://www.simopt.org
www.iitk.ac.in/kangal/Deb_NSGA-II.pdf?

Pre- and Co-requisite Knowledge
This project is most suited for students with good mathematical and modelling skills


Measuring the performance of multi-objective optimisation algorithms
Supervisors: Aldeida Aleti

Background
One of the main aspect of the performance of an optimisation algorithm is the current fitness of the solution(s), where the aim is to minimise/maximise its value. This is usually the way performance is measured in a single-objective algorithm, in which the solution with the best fitness value (smallest value in minimisation problems and largest value in maximisation problems) is reported for each run of the algorithm.

In a multiobjective problem, and in the absence of a priori preference ranking of the objectives, the optimisation process produces a set of nondominated solutions, which make a trade-off between the fitness functions. As a results, the improvement made by the algorithm is expressed in multiple solutions with multiple fitness values. Measuring the performance of a multi-objective optimisation algorithm is not as straightforward, since it requires the use of aggregate measures which capture multiple fitness values from multiple solutions.

Aim and Outline
In this project, we will investigate different methods and develop new metrics for assessing the performance of multi-objective optimisation algorithms.

URLs and References
http://users.monash.edu.au/~aldeidaa/

Pre- and Co-requisite Knowledge
Prior knowledge in optimisation would be helpful.


Clustering and Association Analysis for Identifying Technology & Process Innovation Potentials
Supervisors: Associate Professor Vincent Lee and Dr Yen Cheung

Background:
In the quest for sustainable growth, industrial firms have to identify potential disruptive process or technology during continuous innovation search. Patent data sets which are semistructured and embedded with rich rare topics. By measuring the homogeneity and heterogeneity of patents that can lead to the discovery of potential technology or process innovation opportunities.

Aim and Outline
1. This project aims to use data mining tool to cluster and develop patent clusters;
2. Analyse the association of variant of patents for identifying potential technology and or process innovation for new market development.

URLs and References
[1]. Chiu, T.F., Hong, C.F., & Chiu, Y.T.: Exploring Technology Opportunities in an Industry via Clustering Method and Association Analysis. In C. Badica et al. (Eds.), Lecture Notes in Artificial Intelligence, 8083, pp. 593-602, (2013)
[2] Chiu, T.F.: A Proposed IPC-based Clustering Method for Exploiting Expert Knowledge and its Application to Strategic Planning, Journal of Information Science, pp. 1-17 (online 18 October 2013).
[3] Weka Data mining tool
[4] Runco, M. A and Acar, S. (2013), Divergent Thinking as an Indicator of Creative Potential, Creativity Research Journal, http://www.tandfonline.com/loi/hcrj20

Pre- and Co-requisite Knowledge
Some knowledge on the use of WEKA data mining tool for clustering and discovery of knowledge (text document) from similarity measures.


Agile Smart Grid Architecture
Supervisors: Associate Professor Vincent Lee and Dr Ariel Liebman

Background
Many multisite industrial firms have to respond to the call for reduction in CO2 emission in their business and production process operations. The incorporation of heterogeneous local renewable energy (wind, solar etc) sources and energy storage capacity in their electricity distribution grid bring greater degree of uncertainty that demand timely reconfiguring the grid architecture to optimise overall energy consumption.

Aim and Outline
The project aims to:
Analyse and evaluate (using simulation tool) the various feasible agile smart grid architectures, their communication protocols and control schemes.

URLs and References
[1] Jason Bloomberg, The Agile Architecture Revolution, 2013, John Wiley and Sons Press, ISBN 978-1-118-41787-4 (ebook)
[2] IEEE Transactions on Smart Grid

Pre- and Co-requisite Knowledge
Some knowledge on graph theory based algorithmic development for sensor network.


A Predictive Cyber Security Cost Model for Financial Services Sector
Supervisors: Associate Professor Vincent Lee

Background
Intensive competition in global digital economy has given rise to escalating cyber and physical systems crimes by malicious data miner who plan to gain personal and institutional competitive advantages. Malicious insiders exist in all industries. Amongst all reported cyber- crimes, cybercrimes that are committed by malicious insiders in financial services sector are among the most significant threats to networked systems and data. This is reflected by many enterprises have experienced more than 50% of cybercrimes that have been derived from malicious insiders [1]. A malicious insider is a trusted insider (e.g. current employee, contractor, customer, or business partner) who abuses his/her trust to disrupt operations, corrupt data, ex-filtrate sensitive information, or comprise an IT system, causing loss or damage [2].

Body of academic research and professional practice literature focus mainly on how to detect and how to prevent cybercrimes using security mechanisms that were justified their uses mainly based on past crime patterns. There is, however, research using quantitatively forecast on the cost impact of cybercrimes is still a challenging task.

Aim and Outline
Two main aims of this proposal are:
1. To formulate a predictive cybercrime cost model which can be applied to malicious insider attack within a financial institution; and
2. To verify the predictive power of the formulated model with empirical data.

Outline
This research project attempts to use both qualitative and quantitative approaches for enterprise to estimate the cost impact of a malicious insider attack. Estimation of cost impact is central to allocation of IT security budgets. Depending on the project outcome, funding to attend conference is available.

URLs and References
[1] Adam Cummings, Todd Lewellen, David McIntire, Andrew P. Moore and Randall Trzeciak. “Insider Threat Study: Illicit Cyber Activity Involving Fraud in the US Financial Services Sector- A special Report”, July 2012, Software Engineering Institute, Carnegie Mellon University.
[2] Dawn Cappelli, Andrew Moore, Randall Trzeciak. “The CERT Guide to Insider Threats-
How to Prevent, Detect, and Respond to Information Technology Crimes (Theft, abotage, fraud), Chapter 3, 2012, Addison-Wesley.
[3] Vincent CS Lee and Yee-wei Law. “ Cyber-Physical System risk detection and simulation”, oral presentation to The International Symposium onCyberSecurity (CyberSec2013), Nanyang Technology University, 28-29 January 2013.
[4] Kim-Kwang Raymond Choo. “Cyber threat landscape faced by financial and insurance industry”, Australia’s national research and knowledge centre on crime and justice, Trends & Issues in crime and criminal justice, No. 408 February, 2011.

Pre- and Co-requisite Knowledge
Basic knowledge on economic modelling and use of simulation software package (MATLAB, ExtenSim)


What do employers want from graduate programmers?
Supervisors: Peter O'Donnell

Aim and Outline
This project will involve content analysis of on-line job advertisements for graduate programmers.

The aim of the project will be to systematically examine the nature of the programming jobs that are available for IT graduates. This examination should uncover the types of programming tools and languages that graduate programmers are expected to use, the typical salaries that are being offered and the nature and type of firms that are hiring graduate programmers.

Previous studies have discovered a "gap" between the skills and requirements identified in on-line job ads and the actual skills and requirements that employers require in new hires. This project will, through interviews with a sample of employers, see if that gap still exists.


Content analysis of on-line job advertisements for Business Intelligence professionals.
Supervisors: Peter O'Donnell

Aim and Outline
This project will involve content analysis of on-line job advertisements for business intelligence professionals.

The aim of the project will be to systematically examine the nature of on-line job advertisements for business intelligence professionals. On-line job advertisments provide a useful surrogate for the general business intelligence market. Analysis of job advertisements should provide interesting information to help understand what industries are active in the area of business intelligence, the types of applications that are being developed and the technologies that are being used.

Exploratory analysis conducted to date has revealed some interesting hypotheses about the Australian BI market place that can be explored in this project. For example, the U.S. based TDWI classification of professional roles in the BI area seem not to be relevant when classifying Australian BI roles which seem to be more general and multi-skilled; very few BI roles in Australia have a focus on data modelling or data quality, very few BI roles in Australia require skills in user interface design or data visualisation.


Online Handwritten Signature Verification
Supervisors: Gopal Gupta

Background
This project is in an area of my earlier research. A number of research papers were published.

Aim and Outline
There is considerable interest in authentication based on handwritten signature verification (HSV) because HSV is superior to many other biometric authentication techniques , for example, fingerprints or retinal pattern which are more reliable but much more intrusive.

A number of HSV techniques have been explored over the last 30-40 years. I myself have looked into using dynamic parameters when a signature is signed online. Another approach was based on simulating the hand movements of the person signing online. Both these techniques work well and the results have been published. This project involves finding even more reliable techniques perhaps by exploring yet another approach to online HSV based on identifying curves and straight lines as the signature is signed. You will need to study curves and lines identification techniques in pattern recognition and then use those techniques and perhaps develop new techniques for use in online HSV. The project requires some mathematical knowledge and programming experience.

URLs and References
(with R. C. Joyce) Using Position Extrema Points to Capture Shape in On-line Handwritten
Signature Verification, Pattern Recognition, Vol. 40 pp 2811-2817, 2007

The State of the Art in On-line Handwritten Signature Verification, approx 38pp,
Faculty of Information Technology, Technical Report, 2006

Pre- and Co-requisite Knowledge
The student must have some mathematical background and experience in programming


Automatic Sound Generation and Recognition
Supervisors: Gopal Gupta


Background
No research publication in this area but it is related to an industry project.

Aim and Outline
Some applications require techniques that automatically recognise sounds. Other applications (for example, in digital films) require techniques that can generate any given sound for use in the application. The first part of this project involves developing techniques for sound recognition that detect a wide variety of sounds. The sound recogniser should work even when there is high level of ambient noise. The other part of the project is to develop techniques to generate sounds so that a number of given sounds may be generated from a given list of sounds. Once again it may be necessary to generate sounds for a noisy background environment. The project will involve searching for publications in this area and then perhaps using some ideas from them. The project requires some mathematical knowledge and programming experience.

URLs and References
http://www.ee.columbia.edu/~dpwe/talks/HSCMA-2011-06.pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.26.964&rep=rep1&type=pdf


The impact of using Business Intelligence in healthcare (18 pts)
Supervisors: Caddie Gao and Frada Burstein

Background
Business Intelligence (BI) tools are widely used in many data-intensive organisations to support better decision-making and service delivery. BI tools have been recently adopted as one of the components of the information infrastructure within healthcare context as well. However, the impact of using BI on the healthcare outcomes still needs investigation.

Aim and Outline
The project will examining BI tools developments in a large Australian hospital and their impact on the business using a Case study approach.

Pre- and Co-requisite Knowledge
To tackle this project students need an undergraduate degree in IT (preferably in business information systems) or be a student in the Master of Business Information Systems. Some work experience will be looked at as a bonus.


Visualising Lazy Functional Program Execution
Supervisors: Tim Dwyer and Chris Mears

Background
Pure lazy functional programming languages such as Haskell remain the most advanced programming paradigm in common use. Laziness and functional purity allow the compiler to optimise code in much more sophisticated ways than standard imperative language compilers. Haskell syntax is also, arguably, a more natural and concise way to model problems and algorithms for solving them. However, the difficulty for programmers in these types of languages is understanding what is actually happening with such compiled, optimised code when it is executing. This is a serious blocker to wider adoption of the pure functional paradigm. While a lot of the research in functional languages over the years has been devoted to language and compiler design, it seems less effort has gone into developing really “user friendly” practical tools for developers. There is some recent work in this direction (see links to IHaskell and ghc-vis) however much work remains in making such tools informative and interactive.

Aim and Outline
To develop novel interactive visualisation tools to support programmers in understanding the efficiency and memory use of running haskell programs.

URLs and References
http://www.haskell.org/haskellwiki/Haskell?http://gibiansky.github.io/IHaskell/demo.html
http://hackage.scs.stanford.edu/package/ghc-vis-0.2.1

Pre- and Co-requisite Knowledge
Some knowledge of functional programming would be very useful (ideally haskell, but Lisp/ML/etc are also a good foundation). An interest in graphics, visualisation and software usability would also be advantageous.


Qualitative investigation of software development team behaviour
Supervisors: Robert Merkel, Robyn McNamara, Narelle Warren (TBC)

Background
The overwhelming majority of academic software engineering research has been highly technical in nature, and focused on the development of tools and methodologies to assist in some particular aspect of the development process, such as formal specification methods, or test case generation tools. Even where attempts have been made to formally study the human aspects of software engineering, structured quantitative approaches have been most common. To explore human behaviour, the social sciences typically use a combination of qualitative and quantitative approaches. Qualitative methods including observation and interviews offer the chance to generate new insights about a domain - in this case software development - which then, if desired, can be investigated further using quantitative approaches.

Aim and Outline
The aim of this project is to gain insight into the work patterns, interactions, and key challenges facing a software development team in its task. It is hope that this will gain some insight into some of the similarities and differences between the team's nominal development process and what they actually do in practice.

At this stage, it is planned to use a mixed-methods qualitative approach, with a period (of 1-2 weeks) of unstructured participant observation of an industrial software development group followed up with semi-structured interviews. Such an approach is similar to ethnographic methods used in anthropology.

URLs and References
Kanij, T.; Merkel, R.; Grundy, John, "A Preliminary Study on Factors Affecting Software Testing Team Performance," Empirical Software Engineering and Measurement (ESEM), 2011 International Symposium on , vol., no., pp.359,362, 22-23 Sept. 2011, http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6092588&tag=1
http://www.umanitoba.ca/faculties/arts/anthropology/courses/122/module1/methods.html

Pre- and Co-requisite Knowledge
Some background in software engineering units is required. Training and guidance in qualitative research methods will be provided.


Democratising Big Data: Public Interactive Displays of Sustainability Data
Supervisors: Lachlan Andrew, Tim Dwyer, Ariel Liebman, Geoff Webb

Background
We have access to potentially finely-grained data on energy use around the University and particularly in some key new buildings. We would like to create interesting interactive visualizations that allow people to explore this data.

Aim and Outline
One possibility is that we set up a public display that can be controlled by passers-by using a Microsoft Kinect interface. Another (complementary) possibility is that we design a mobile or web app that allows people to explore this data on their own device. The point is to raise peoples' awareness of energy usage and efforts to improve sustainability of buildings at Monash. The HCI (Human Computer Interaction) research goal is to explore how novel interactive visualization and effective UI design can engage casual observers.

URLs and References
http://intranet.monash.edu.au/bpd/services/environmental-sustainability.html

Pre- and Co-requisite Knowledge
This project should appeal to students with an interest in graphics and natural user interface design.


How close are they? Conflict of interest in Academia
Supervisors: Lachlan Andrew

Background
Peer review is central to the health of scientific publishing. This requires that the reviewers of a scientific paper be sufficiently independent of the authors. For example, the reviewer should not be a current collaborator, a former student or supervisor, or be working at the same institution.

However, this is not always clear-cut. What if they published together 10 years ago What if they used to work at the same institution? What if they have a close collaborator in common?

Aim and Outline
This project will develop software to determine whether an academic has a "conflict of interest" with any of the authors of a document. It will use public databases such as Google Scholar and the mathematics Genealogy Project to determine how "close" a candidate reviewer is to the authors of a candidate paper.

URLs and References
http://scholar.google.com http://genealogy.math.ndsu.nodak.edu/

Pre- and Co-requisite Knowledge
Independent problem solving skills


Evaluating the ability of solar energy to meet peak electricity demand
Supervisors: Lachlan Andrew, Kevin Korb, Ariel Liebman

Background
In Australia, customers with solar panels are billed using "Nett metering", which means that the retailer is only told the difference between the amount of energy used and the amount generated, but not the actual amount generated. Separating these two is important for many reasons, such as estimating the growth in total electricity consumption and determining how much solar energy contributes at times of greatest need.

Aim and Outline
This project will combine multiple sources of data to estimate both the local generation and total consumption of each customer in each half-hour interval. In particular, it will use real-time data obtained from smart meters to determine the instantaneous nett use of individual customers, and data measured from a small subset of solar installations to estimate the available solar energy at different locations at different times.

For a given amount of available solar energy, the amount of power generated by a solar panel depends on its orientation. The research component of this thesis is to implement a new procedure to estimate the orientation of each customer's panels. This will allow an accurate estimate of the generation at each installation.

URLs and References
http://intranet.monash.edu.au/bpd/services/environmental-sustainability.html

Pre- and Co-requisite Knowledge
Fluency with first-year level maths. Ability to fill in gaps in a loosely specified algorithm


Where does my electricity go?
Supervisors: Reza Haffari, Lachlan Andrew, Ariel Liebman

Background
Have you ever wondered why your electricity bill is high on a particular month? Smart meters have the potential to tell us which devices consume most of our electricity, but we must coax the information out of them. Smart meters report half-hourly energy use to your retailer, but also can distribute finer time-scale data over a wireless LAN. We would like to "disaggregate" this data and determine how much energy is used by individual devices, eg air conditioning, fridges, heating, cooking etc. This helps awareness about usage pattern, hence potentially reducing significantly the electricity consumption.

Aim and Outline
In this project, we design and develop machine learning techniques suitable for analysing and mining electricity usage data. The ideal model will be able to accommodate other sources of valuable information as well, e.g. time of the day, season, and temperature records. Particularly we explore a powerful statistical model, called Factorial Hidden Markov Models (FHMMs), and augment it with additional components to capture domain knowledge. We will make use of publicly available data in this project (REDD data set from MIT: http://redd.csail.mit.edu).

URLs and References
http://redd.csail.mit.edu

Pre- and Co-requisite Knowledge
Basic probability


Finding Monash's heating and cooling costs
Supervisors: Lachlan Andrew, Geoff Webb, Tim Dwyer, Ariel Liebman

Background
We have access to potentially finely-grained data on energy use around the University and particularly in some key new buildings. We would like to "disaggregate" this data further, to identify how much energy is used by different systems, specifically HVAC (heating, ventilation and air conditioning), but also lighting or office equipment.

Aim and Outline
In this project, we will use a combination of manual sleuthing and data mining techniques to determine what component of Monash's electricity consumption is due to heating and cooling. This will combine the above data set with hourly temperature measurements to try to detect the times at which a building's air conditioning or heating turns on or off, and the power consumption while it is on. The resulting data will be useful for raising awareness about which energy-saving strategies are likely to produce substantial savings. This data will ideally also form the input to a data visualisation project to convey this data to the wider campus community.

URLs and References
http://intranet.monash.edu.au/bpd/services/environmental-sustainability.html

Pre- and Co-requisite Knowledge
Basic probability. Understanding Fourier transforms would be an advantage


Planning for an uncertain energy future
Supervisors: Aldeida Aleti, Ariel Liebman

Background
Electricity grids around the world and in Australian are in the midst of a profound transformation. New technologies such as rooftop solar panels, wind farms, and smart meters are challenging current paradigms in system planning and even threatening existing electricity utility business models.

Aim and Outline
Electricity utilities, system planners, and governments are facing many future trends that are extremely uncertain. For example there is a great deal of uncertainty about electricity demand growth (or decline) compounded by uncertainty in the rate at which renewable technology costs decline. This project aims to develop optimisation techniques to model the impacts of uncertainty in demand growth, technology costs, and electricity generation feed stocks on optimal investment strategies in renewable technologies in an electricity system. The project will take some of it’s inspiration from the work done by the CSIRO Future Grid Forum.

Pre- and Co-requisite Knowledge
Project will appeal to student with interest in simulation and modelling with some programming experience. No prior knowledge in optimisation and energy systems required.


Predicting Tennis Performance
Supervisors: Kevin Korb and Michael Bane (Tennis Australia)

Background
Tennis Australia is interested in improving its assessment of Australian tennis prospects in order to maximise its use of its training and financial resources. It has about 35 years of data on rankings and performance and other data on match statistics and physiological data.

Aim and Outline
Build and test statistical models of long-term performance of tennis athletes. We will use Bayesian net classifiers and other Bayesian network models to find good predictors of performance.

Tennis Australia is offering a $1000 scholarship to the selected Honours student.

URLs and References
KB Korb and AE Nicholson (2011) Bayesian Artificial Intelligence, 2nd ed. CRC Press.

Pre- and Co-requisite Knowledge
A knowledge of statistics, machine learning or data mining would be useful.


Software Tools for the Manipulation of 3D Image Content
Supervisors: Peter Tischer and Carlo Kopp

Background
A software engineering background would be advantageous but is not essential.

Aim and Outline
Computer displays and televisions built for the presentation of 3D (stereoscopic) content are now becoming commodity hardware products, rather than specialised research support products. The market for 3D products is driven mostly by the entertainment sector, and in computer displays, the dominant market for 3D displays and supporting graphics adaptors is the computer gaming market, with several hundred products now capable of driving a 3D display. Responding to the market, hardware vendors are now producing 3D LED displays for desktop and deskside computers, or high performance notebooks with 3D LED displays and infrared emitters for 3D glasses built into the notebook. Concurrently, stereoscopic (3D) cameras are re-emerging in the market, with two products now available. While hardware for displaying 3D content is now becoming affordable and common, there is a shortage of software tools available for manipulating 3D content, especially 3D digital imagery produced by stereo cameras. Extant proprietary software tools like Photoshop, Aperture, Lightroom, Optics Pro, and GPL tools like GIMP lack proper support, or provide at best rudimentary support for 3D imagery.

The aim of this project is to produce an Open Source software tool and capable of converting, parallax adjusting, displaying and exporting imagery in common stereoscopic formats. The tool must be highly portable and integration with GIMP would be highly desirable. The overall objective of the project is to produce an image editor for 3D images. The initial implementation will enable operations to be carried out on 3D images or the fields of stereoscopic images. The tool should be designed in such a way that it can be turned into a full editing tool that will allow operations to be carried out on components of images. It is expected the prototype will operate on nVidia 3D adaptors and 120 Hz 3D monitors.

Pre- and Co-requisite Knowledge
A student will not need to have studied image processing or computer graphics at undergraduate level


Complex and Clever Metadata Architectures for Research Data Management
Supervisors: Joanne Evans, Yuan-Fang Li, Tom Denison, Henry Linger

Background
Metadata is one of the conundrums facing those designing and developing advanced scholarly information infrastructures to facilitate distributed, data intensive, collaborative research. Any infrastructure for heterogeneous data sharing must be able to cope with a plethora of metadata ontologies, schemas, standards, representations and encodings, as one-size-fits-all approaches suffer in terms of metadata quality and usefulness. They do not provide the degree of specificity needed for efficient and effective discovery, access, interpretation and use.

Aim and Outline
This research project will investigate the design of a metadata management architecture for a research hub that can cope with complexity and diversity, mapping and managing commensurability, as well as allowing for necessary specialisation and extension, in order to facilitate sharing and re-use of research data.


Simulating batteries in smart grid
Supervisors: Vincent Lee, Ariel Liebman, John Betts

Background
Electricity grids around the world and in Australian are in the midst of a profound transformation. New technologies such as rooftop solar panels, wind farms, and smart meters are challenging current paradigms in system planning and even threatening existing electricity utility business models.

Aim and Outline
This project aims to model the integration of batteries into the smart grid using cloud based high performance computing. The model incorporates an industry standard power system simulation tool called Plexos configured to find the optimal investment in renewable generation technologies in a complex electricity network. The project will entail incorporating models of a range of new battery technologies to determine whether batteries can significantly improve the cost of investing in renewable and other low carbon energy technologies.

URLs and References
http://www.csiro.au/Organisation-Structure/Flagships/Energy-Flagship/Future-Grid-Forum-brochure.aspx

Pre- and Co-requisite Knowledge
Projct will appeal to student with interest in simulation, business decision making and modelling. No prior knowledge in optimisation and energy systems required.


Beating the World Record on Freight Transport Problems
Supervisors: Mark Wallace, Richard Kelly

Background
The vehicle routing problem with time windows has been tackled by groups all over the world using all kinds of optimisation approaches. These approaches are validated and compared against a set of problem instances published at SINTEF. The major optimisation company Quintiq publishes their best results on their website.

Aim and Outline
The recent method of guided ejection chains has been successful on several transport applications, and another method based on large neighbourhood search was used in Richard Kelly's recent PhD to obtain world-class results, with a couple of world records. This project will experiment with a combination of guided ejection search and large neighbourhood search to obtain new world records.

The expected outcomes of this project are:
* New world-best results on vehicle routing with time window benchmarks;
* A publication comparing our results with Quintiq's ;
* An efficiently implemented guided ejection search implementation;

URLs and References
Quintiq World Records: http://www.quintiq.com/optimization/vrptw-world-records.html
SINTEF VRPTW Benchmarks: http://www.sintef.no/Projectweb/TOP/VRPTW/
Nagata, Y., & Bräysy, O. (2009). A powerful route minimization heuristic for the vehicle routing problem with time windows. Operations Research Letters, 37(5), 333-338.
Pisinger, D., & Ropke, S. (2007). A general heuristic for vehicle routing problems. Computers & operations research, 34(8), 2403-2435.

Pre- and Co-requisite Knowledge
Experience in programming in C++, an interest in optimisation technology, and the desire to win!


Beating the World Record on Freight Transport Problems
Supervisors: Mark Wallace, Richard Kelly

Background
The vehicle routing problem with time windows has been tackled by groups all over the world using all kinds of optimisation approaches. These approaches are validated and compared against a set of problem instances published at SINTEF. The major optimisation company Quintiq publishes their best results on their website.

Aim and Outline
The recent method of guided ejection chains has been successful on several transport applications, and another method based on large neighbourhood search was used in Richard Kelly's recent PhD to obtain world-class results, with a couple of world records. This project will experiment with a combination of guided ejection search and large neighbourhood search to obtain new world records.
The expected outcomes of this project are:
* New world-best results on vehicle routing with time window benchmarks;
* A publication comparing our results with Quintiq's ;
* An efficiently implemented guided ejection search implementation;

URLs and References
Quintiq World Records: http://www.quintiq.com/optimization/vrptw-world-records.html
SINTEF VRPTW Benchmarks: http://www.sintef.no/Projectweb/TOP/VRPTW/
Nagata, Y., & Bräysy, O. (2009). A powerful route minimization heuristic for the vehicle routing problem with time windows. Operations Research Letters, 37(5), 333-338.
Pisinger, D., & Ropke, S. (2007). A general heuristic for vehicle routing problems. Computers & operations research, 34(8), 2403-2435.

Pre- and Co-requisite Knowledge
Experience in programming in C++, an interest in optimisation technology, and the desire to win!


ERP in the Cloud by SMEs
Supervisors: Sue Foster


Background

Little research has been conducted to assess the effectiveness or otherwise of adopting ERP in the cloud in SMEs

Aim and Outline
To identify the critical issues that affect organisations that conduct their ERP systems in the cloud

URLs and References
Business Process Management Journal, 11(2), 158-170.
Klause, H. & Rosemann, M. (2000). What is enterprise resource planning? Information Systems Frontiers (special issue of The Future of Enterprise Resource Planning Systems), 2 (2), 141-162.
Lewis, P. J. (1993). Linking Soft Systems Methodology with Data-focused Information Systems Development, Journal of Information Systems, Vol. 3, 169-186.
Markus, M.L., Axline, S., Petrie, D., & Tanis, C. (2000) Learning from adopters' experiences with ERP: problems encountered and success achieved. Journal of Information Technology , 15, 245-265.
Nolan, & Norton Institute. (2000). SAP Benchmarking Report 2000, KPMG Melbourne.
Queensland Health Corporate Publications: Change management Documents: Located at http://www.health.qld.gov.au/publications/change_management/
Parr., A. & Shanks, G. (2000). A model of ERP project implementation. Journal of Information Technology, 15, 289-303.
Ross, J. W. (1999). "The ERP Revolution: Surviving Versus Thriving, Centre for Information System Research, Sloan School of Management, MA, August 1999.
Scott, J. E., & Vessey, I. (2002). Managing risks in enterprise systems implementations. Communications of the ACM, April, Vol. 45, No 4. Retrieved on 19 March 2010,
Located at: http://delivery.acm.org/10.1145/510000/505249/p74-scott.pdf?key1=505249&key2=8269509621&coll=GUIDE&dl=GUIDE&CFID=80880926&CFTOKEN=57269991
Sedera, D., Gable, G., & Chan., T. (2003). Measuring Enterprise Systems Success: A Preliminary Model. Ninth Americas Conference on Information Systems, 476-485.
Shang, S., & Seddon, P. B. (2002). Assessing and managing the benefits of enterprise systems: the business manager's perspective. Information Systems Journal. 12, pp 271-299.
Shang, S. & Seddon, P. B. (2000). "A comprehensive framework for classifying the benefits of ERP systems" in the proceedings of the twenty third Americas Conference on Information Systems. 1229-1698.
Skok, W., & Legge, M. (2001). Evaluating Enterprise Resource Planning (ERP) Systems using an Interpretive Approach. ACM., SIGCPR, San Diego. 189-197. (Benefit realisation
Sumner, M. (2000). "Risk factors in enterprise-wide/ERP projects." Journal of Information Technology 15(4): 317 - 327.
Titulair, H. B., Oktamis, S., and Pinsonneault, A. (2005). Dimensions of ERP implementations and their impact on ERP Project outcomes. Journal of Information Technology Management. XVI, 1. Located at http://jitm.ubalt.edu/XVI-1/article1.pdf

Pre- and Co-requisite Knowledge
Enterprise information systems knowledge would be an advantage


SOA implementation benefits, barriers and costs
Supervisors: Sue Foster


Background
Little research has been conducted to assess the implementation barriers, benefits or costs of using Service Oriented Architecture

Aim and Outline
To identify the issues that affect organisations adopting SOA

URLs and References
http://www.health.qld.gov.au/publications/change_management/
Parr., A. & Shanks, G. (2000). A model of ERP project implementation. Journal of Information Technology, 15, 289-303.
Ross, J. W. (1999). "The ERP Revolution: Surviving Versus Thriving, Centre for Information System Research, Sloan School of Management, MA, August 1999.
Scott, J. E., & Vessey, I. (2002). Managing risks in enterprise systems implementations. Communications of the ACM, April, Vol. 45, No 4. Retrieved on 19 March 2010,
Located at: http://delivery.acm.org/10.1145/510000/505249/p74-scott.pdf?key1=505249&key2=8269509621&coll=GUIDE&dl=GUIDE&CFID=80880926&CFTOKEN=57269991
Sedera, D., Gable, G., & Chan., T. (2003). Measuring Enterprise Systems Success: A Preliminary Model. Ninth Americas Conference on Information Systems, 476-485.
Shang, S., & Seddon, P. B. (2002). Assessing and managing the benefits of enterprise systems: the business manager's perspective. Information Systems Journal. 12, pp 271-299.
Shang, S. & Seddon, P. B. (2000). "A comprehensive framework for classifying the benefits of ERP systems" in the proceedings of the twenty third Americas Conference on Information System

Pre- and Co-requisite Knowledge
Enterprise information systems knowledge would be an advantage


Extending the ERP system beyond the organisational boundaries
Supervisors: Sue Foster


Background
Most research is conducted within the ERP system however ERP now extends beyond the organisational boundaries to establish links with vendors, sellers and a variety of other stakeholders

Aim and Outline
To identify the critical issues impacting on organisations that extend their ERP systems beyond organisational boundaries

URLs and References
ACC (1984). ERP implementations and their issues. Proceedings of the Australian Computer Conference, Sydney, Australian Computer Society, November Edn.
1.Journal of Computer Information Systems, Spring, 81-90.
Barati, D. Threads of success and failure in business process improvement. Located at http://www.isixsigma.com/library/content/c070129a.asp
Managing Barriers to business Reengineering success located at:
http://www.isixsigma.com/offsite.asp?A=Fr&Url=http://www.prosci.com/w_0.htm
Roseman, M. (2001). Business process Optimisation: Making Process Re-engineering Actually work. Coolong Consulting (Australia) Pty Ltd
Bingi, P. Sharma M.K. and Godla J.K. (1999). "Critical Issues Affecting an ERP Implementation", Information Systems Management, Vol. 16, 3, 7-14.
Boyle., T. A., & Strong, S. E. (2006). "Skill requirements of ERP graduates." Journal of Information Systems Education 17(4): 403-412.
Curran, T. A., & Ladd, A. (2000). SAP R/3: business Blueprint: Understanding Enterprise Supply Chain Management (2nd Edn). Sydney: Prentice Hall Australia Pty, Ltd.
Davenport, T. H. (2000a). Mission critical: Realising the promise of enterprise systems. Boston: Harvard Business School Press.
Davenport, T. H. (2000b). The future of enterprise system-enabled organisations. Information Systems Frontiers (special issue of The future of Enterprise Resource Planning Systems Frontiers), 2(2), 163-180.
Davenport (1998). Putting the enterprise into the enterprise system. Harvard Business Review. July-August 1998.
Davenport, T. H., (1990). The New Industrial Engineering: Information Technology and Business Process Redesign, Sloan Management Review, 31(4), Summer, 11.
Francoise, O., Bourgault., M. & Pellerin, R. (2009). ERP implementation through critical success factors' management. Business Process Management Journal, 15(3), 371-394.
Hammer, M. (2000). Reengineering work: Don't' Automate Obliterate. Harvard Business Review. July-August.

Pre- and Co-requisite Knowledge
Enterprise information systems knowledge would be an advantage


Adaptive Genetic Algorithms in Search-Based Software Engineering
Supervisors: Aldeida Aleti


Background
Software testing is a crucial part of software development. It enables quality assurance, such as correctness, completeness and high reliability of the software systems. Current state-of-the-art software testing techniques employ search-based optimisation methods, such as genetic algorithms to handle the difficult and laborious task of test data generation. Despite their general applicability, genetic algorithms have to be parameterised in order to produce results of high quality. Different parameter values may be optimal for different problems and even different problem instances. In this project, we will investigate a new approach for generating test data, based on adaptive optimisation. The adaptive optimisation framework will use feedback from the optimisation process to adjust parameter values of a genetic algorithm during the search.?????

Aim and Outline
The goal of this project is to evaluate (based on simulations and realistic examples) adaptive genetic algorithms in the context of specific software engineering problem(s). The specific tasks are:
- Understand the current approaches in search based software testing and adaptive genetic algorithms
- Perform an experimental evaluation of adaptive genetic algorithms for software testing

URLs and References
http://users.monash.edu.au/~aldeidaa

Pre- and Co-requisite Knowledge
Programming skills in Java or C++, Understanding of, or willingness to learn, the software engineering and statistical foundations needed for the project.


Predicting Clicks on Product Recommendation Interruptions
Supervisors: Mark Carman


Background
Online retailers often recommend content to users based on the purchases of other users, together with the user's previous purchases. Some retail platforms prompt the user with such suggestions, and thus must decide when best to interrupt the user with a new message, (since too frequent prompting can annoy the user). This project aims to estimate the usefulness of the message to the user by estimating the probability that a user will click on a particular product recommendation message.

Aim and Outline
The aim of the project is to build well calibrated models for estimating the probability that a user will click on a product recommendation, based on their purchase history, demographic information, current context, etc.. The project will make use of real data from an online retailing platform.

URLs and References
An example of user click prediction is given in this paper: http://www.wsdm-conference.org/2010/proceedings/docs/p351.pdf

Pre- and Co-requisite Knowledge
Good understanding of maths/stats.


Detection of interesting patterns via the duality of user/message metadata on Twitter
Supervisors: Marc Cheong


Background
There is a large amount of hidden metadata (data-about-data) available on Twitter, generated by users in their day-to-day activity on the microblogging site. With the increasing awareness on online privacy, the question of "What inferences or possible real-world patterns of can we gleam from a collection of metadata harvested on Twitter" intrigues researchers. Several inference algorithms and case studies have already been developed in Cheong's PhD thesis (2013) and tested on a 10GB dataset of real-world Twitter metadata; however, many improvements can be made in light of Twitter's evolution over the past couple years.

Aim and Outline
The aim of this project is to investigate new approaches to Twitter metadata analysis. This might be done by improving existing algorithms or creating new ones (based on theories in e.g. HCI, social science, etc) and evaluating the effectiveness in modelling/studying a real-world phenomenon.

URLs and References
Inferring social behavior and interaction on Twitter by combining metadata about users & messages (PhD thesis). <http://arrow.monash.edu.au/vital/access/manager/Repository/monash:120048>

Pre- and Co-requisite Knowledge
Knowledge in basic statistics, data mining techniques, social media.


Creating an extensible framework for automated solving of cryptic crosswords using machine learning and natural language techniques
Supervisors: Robyn McNamara, David Squire

Background
Cryptic crosswords are commonly found in newspapers all around the globe, from the British Guardian to our very own The Age and the Herald Sun. However, for a human, one of the challenges in learning such crosswords is the learning curve involved, as well as the inside knowledge required in deciphering (or parsing) a clue.

Currently, there is a scarcity of computer-based approaches to parse cryptic crossword clues, let alone solve entire puzzles! A few such papers were written decades ago, such as Williams & Woodhead (1979) and Smith & du Boulay (1986). Commercial solvers such as William Tunstall-Pedoe's Crossword Maestro do exist; however the algorithms used in such solvers are proprietary.

Aim and Outline
Realising this niche, this project aims to create an extensible framework for the automated solving of cryptic crossword clues (and by extension, an entire cryptic crossword grid). This framework should ideally be plugin-based, to allow for extensibility in e.g. handling new clue types. The proposed solution could use existing sources of semantic relations between words, e.g. the Natural Language ToolKit (NLTK), or WordNet.

URLs and References
Related papers on Google Scholar <http://scholar.google.com.au/scholar?q=computer+cryptic+crosswords>

Williams, P. W. and Woodhead, D. (1979). Computer assisted analysis of cryptic crosswords. <http://comjnl.oxfordjournals.org/content/22/1/67.abstract>

Pre- and Co-requisite Knowledge
Knowledge (or willingness to learn) the cryptic crossword is a must. From the technical standpoint: use of the NLTK (or similar) libraries, a good knowledge of data structures and search algorithms.


Does virtualisation save energy?
Supervisors: Lachlan Andrew


Energy related expenses account for around half the cost of a data centre, and so minimising energy use is an important aspect of data centre management. A popular tool for achieving this is virtualisation, in which one powerful computer emulates multiple smaller computers. This allows many lightly-utilised computers, all consuming power, to be replaced by a small number of highly-utilised computers.

However, virtualisation is not free. Emulation has substantial overheads, which are often ignored.

Aim and Outline
This project will involve measuring the performance (speed per watt) of virtualised hosts running on different hardware, and compare that with the performance of native execution. The final outcome will be a design guide telling operators when it is beneficial to virtualise and when it is not.

URLs and References
http://computer.howstuffworks.com/server-virtualization1.htm http://greeneconomypost.com/virtualization-data-center-energy-efficiency-19009.htm

Pre- and Co-requisite Knowledge
Requires general programming skills.
Will provide knowledge of:

  • experimental design
  • data centre energy management
  • writing code to interface to measurement equipment

Clear text in JPEGs
Supervisors: Lachlan Andrew and Mark Carman

Background
JPEG is an inexact ("lossy") compression standard, which introduces errors. For photographs, these errors are usually insignificant, but for line drawings and text, they appear as smudges around crisp edges.

A naive way to remove these smudges is to convert all grey pixels in black-and-white line drawings to either pure black or pure white.
However, it is common for the lines in the original image to have grey edges to make the lines look smooth ("anti-aliasing").

Aim and Outline
This project will develop software for removing speckles around text and lines in JPEG images.

It will pose the decoding as an optimization problem. (For a given compressed file, it will find the image with the maximum number of background-coloured pixels that could possibly have been compressed into that file.) This will result in a much clearer image than the traditional decoding technique. If time permits, the resulting algorithm will be implemented as a plug-in for Chrome and Firefox.

URLs and References
https://en.wikipedia.org/wiki/Jpeg

Pre- and Co-requisite Knowledge
Programming skills (C/C++ and/or Matlab preferred)
Basic mathematics (Knowledge of Fourier transforms helps)


Probabilistic Methods for Information Retrieval
Supervisors: Prof Wray Buntine

Background
In the world of Information Retrieval, BM25, a variant of TF-IDF is king. "Language models for information retrieval" have been developed as an alternative but is an incremental improvement at best, primarily because the models are mostly unigram. Richer predictive models would look at word interactions and could offer improvements.

Aim and Outline
To explore richer predictive models of text in the language modelling style and evaluate their performance on some standard collections. We have some probabilistic methods in mind here. An initial study would abandon computational considerations and test out different predictive models for retrieval performance ignoring cost.

URLs and References
The BM25 model is in "The Probabilistic Relevance Framework: BM25 and Beyond" at http://dl.acm.org/citation.cfm?id=1704810

Pre- and Co-requisite Knowledge
FIT3080 Intelligent systems or equivalent is a prerequisite, and knowledge of probability and/or experimental computer science. Good programming experience (the code is in C).


Visualisation Applications for "ContextuWall" in the Monash CAVE2
Supervisors: Dr Tim Dwyer and Prof Falk Schreiber


Background
Immersive Analytics is about creating computer software and hardware that support collaborative analysis, decision making, design and data understanding by providing an immersive multi-sensory user experience in which the users can directly interact with their data or design. It provides a powerful, natural interface to analytics software such as simulation, optimisation and data mining.

Aim and Outline
Immersive Analytics aims to create novel, natural ways for people to explore and interact with complex data. This project aims to repurpose Monash's $1.9 Million CAVE2 facility to better support data analysis. It will develop gesture-based interaction and large display methods in the CAVE2.

URLs and References
http://monash.edu.au/cave2

Pre- and Co-requisite Knowledge
Programming skills (ideally one or more of: Python, C#, C++, Java)


Visualising Biological Pathways in Cola.js
Supervisors: Dr Tim Dwyer and Prof Falk Schreiber


Background
Visualisation of biological processes and networks is increasingly important, and graphical standards are available to support knowledge representation in the biosciences (Systems Biology Graphical Notation, SBGN).

Aim and Outline
The future of the computing is the web and HTML5 now offers a complete platform for building rich interactive applications. Cola.js (A.K.A. 'WebCoLa') is an open-source JavaScript library developed by researchers in our Faculty for arranging your HTML5 documents and diagrams. This project will extend Cola.js to visualise biological networks and cellular processes.

URLs and References
www.sbgn.org

Pre- and Co-requisite Knowledge
Programming skills (ideally Javascript and HTML5), systems biology standards (SBGN, SBML)


Verification and validation of open APIs for banking
Supervisors: Yuan-Fang Li and Robert Merkel

Background:
Banks have a complex IT infrastructure with very high reliability, robustness, and security requirements. Many banks are currently developing open application programming interfaces (APIs) to make banking functionality available more flexibly, both within and across organisation boundaries. These open APIs interact in a variety of complex ways, and without proper quality assurance measures, such interactions may have undesirable and costly consequences.

Aim and Outline
In this project, we propose to combine formal methods and software testing techniques to model, verify and validate open banking APIs and their interactions. Modeling and checking the APIs will help to show the fundamental soundness of the APIs - or reveal potentially serious design flaws, if they exist. Then, the model can be used to efficiently test systems that implement the modeled APIs, thus giving confidence that the systems under test implement the APIs correctly. This project is supported by ANZ Bank and will have a focus on ANZ banking systems.

URLs and References
http://mit.bme.hu/~micskeiz/pages/modelbased_testing.html
http://openbankproject.com/en/

Pre- and Co-requisite Knowledge
Having studied FIT3013 or equivalent would be an advantage.


Shadow Business Intelligence (18 pts)
Supervisors: David Arnott and Caddie Gao

Background

Shadow IT is a reality in modern organisations and comprises IT applications and infrastructure that exist outside the boundaries and control of an organisation's formal IT structure, irrespective of
whether that structure is centralised or decentralised, or whether it is unsourced or outsourced. Shadow IT has been estimated as comprising up to half of an organisation's IT capability. It has been enabled by
the collapse in the cost of hardware, software, and networks, as well as increased IT education across all business disciplines. No research has been conducted into shadow BI and no industry source has claimed
it even exists.

Aim and Outline
The aim of the project is to investigate the existence and nature of shadow BI.

This project could be conducted using a case study or a survey.

Students working on this project will be required to be part of the DSS Lab. This includes attendance and participation in weekly seminars.

Pre- and Co-requisite Knowledge
To tackle this project students need an undergraduate degree in IT (preferably in business information systems) or be a student in the Master of Business Information Systems./p>


Galactic archaeology using Minimum Message Length
Supervisors: David Dowe and Prof. John Lattanzio


Background

We consider astronomical data of stars from GALAH/HERMES as given by the stars' relative chemical concentrations. The ratios of concentrations of different chemical elements gives some idea about the generation of stars.

Aim and Outline
We seek patterns in the data of chemical element concentrations in terms of clustering the stars into groups and also in terms of finding different ratios of concentrations in the various groups. We do this using the Bayesian information-theoretic Minimum Message Length (MML) principle of machine learning. The work will involve the statistical technique of latent factor analysis and the technique from statistics and machine learning of mixture modelling and clustering.

URLs and References
C. S. Wallace (2005), "Statistical and Inductive Inference by Minimum Message Length", Springer

Pre- and Co-requisite Knowledge
Good marks in university mathematics-related or statistics-related subjects, at least to first-year level, and an ability at or interest in mathematics. Likewise, knowledge of or interest in astronomy./p>


Extended Database Normalisation Using MML
Supervisors: David Dowe and Nayyar Zaidi

Background
Database Normalisation is typically done in order to avoid update, insertion and deletion anomalies - essentially making sure that any stored data won't be lost and that any update, insert or delete operations only have to be done in one place. The normalisation is depending upon the (so-called) business rules. But, in certain situations, the attributes might not have such helpful names as StudentId and SubjectId, and the business rules won't all be known. In such situations, it might be be necessary to infer both the business rules and the normalisation. Using Minimum Message Length (MML), his has already been done as far as third normal form (3NF).

Aim and Outline
We take this to higher normal forms, and then apply this on larger data-sets.

We also explore what sort of other machine learning and statistical techniques (other than MML) might be able to infer these normal forms.

URLs and References
David L. Dowe and Nayyar A. Zaidi (2010), "Database Normalization as a By-product of Minimum Message Length Inference", Proc. 23rd Australian Joint Conference on Artificial Intelligence (AI'2010) [Springer Lecture Notes in Artificial Intelligence (LNAI), vol. 6464], Adelaide, Australia, 7-10 December 2010, Springer, pp82-91.

C. S. Wallace (2005), "Statistical and Inductive Inference by Minimum Message Length", Springer.

Pre- and Co-requisite Knowledge
Good marks in university mathematics-related or statistics-related subjects, at least to first-year level, and an ability at or interest in mathematics. Also, satisfactory completion of at least one database-related subject.


Effects of automation on employment and society
Supervisors: David Dowe

Background
Going back perhaps as far as the printing press, automation has changed employment and society. Advances in technology in more recent decades see computers not only outperforming humans at tasks once thought to be the preserve of humans, but also increasingly performing jobs which many thought only humans could ever do. What will be the impact on employment and employment levels? Which careers are safer? What are likely impacts on society? While many have been anticipating the technological singularity (when machines are purported to become smarter than humans) at least as far back as Solomonoff (1967) and Good (1965), whenever that does or doesn't come, the effects of job displacement seem to be arriving with increasing rapidity.

Aim and Outline
To build upon the given references and other studies to address these questions of employment and society, initially in an Australian context. This work will possibly be partly supported financially by a branch of the Australian Government.

URLs and References
David H. Autor (3/Sept/2014), "Polanyi's Paradox and the Shape of Employment Growth" (47 pages)

Frey and Osborne (2013), "The future of employment: how susceptible are jobs to computerisation?"

"Humans Need Not Apply" [https://www.youtube.com/watch?v=7Pq-S557XQU, published on 13 Aug 2014]

Tali Kristal (2013), "The Capitalist Machine: Computerization, Workers' Power and the Decline in Labor's Share Within U.S. Industries", American Sociological Review, 78 (3), pp361-389.

R. J. Solomonoff (1967), "Inductive Inference Research Status Spring 1967".

Pre- and Co-requisite Knowledge
A knowledge of I.T. and automation, the ability to read and understand the given references. This includes having a sufficient background in mathematics and statistics to understand the relevant analyses. A knowledge of economics or sociology would be a bonus.


Big Data Processing: An Industry Case Study with the Railway Institute
Supervisors: Assoc Prof David Taniar

Background
Are you interested in learning Big Data? Big Data is a multi-million industry. This project focuses on processing large data volume, and it is in a collaboration with the Railway Institute at Monash, that
provides large datasets about railways. You will work with A/Prof David Taniar as well as a team from The Railway Institute, Monash.

Aim and Outline
This project aims to solve big data problems by developing programs to answer queries in a timely manner. You will program using MapReduce/Hadoop, and latest technologies, such as Spark.

URLs and References
http://en.wikipedia.org/wiki/Big_data
http://en.wikipedia.org/wiki/Apache_Hadoop
http://en.wikipedia.org/wiki/MapReduce

Pre- and Co-requisite Knowledge
A strong background in programming and databases/p>


Music Database Processing
Supervisors: Assoc Prof David Taniar

Background

Do you play any classical music instruments? Do you want to combine your advanced musical skills with computer science. This project analyses classical musics using computer science techniques.

Aim and Outline
This project aims to process and analyse classical music recordings, including sonata form analysis, chord progression, concerto identification, etc. You will need to learn the basic of signal processing, and Matlab.

URLs and References
https://dl.dropboxusercontent.com/u/15183317/25981366_FIT1016_FinalReport.pdf

Pre- and Co-requisite Knowledge
You must be an intermediate music instrument player (e.g. minimum level 5 or 6 piano, violin/cello, brass, woodwind)./p>


Learning from non-stationary distributions
Supervisors: Geoff Webb

Background
The sheer volume and ubiquity of data in the information age demand increasingly effective technologies for data analysis. Most online data sources are non-stationary: factors bearing on their composition change over time, as do relations among those factors. But nearly all machine-learning algorithms assume invariance, which greatly reduces their usefulness.

Aim and Outline
This Project will be the first comprehensive investigation of emerging technologies for learning from non-stationary distributions, guided by the insight that subgroups change in different ways, at different times, and at different speeds. Outcomes will include robust, tested, and reliable data analytics for non-stationary data - enabling far more efficient use of big data, with countless real-world applications.

URLs and References
G.I. Webb (2014). Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data. In Proceedings of the 14th IEEE International Conference on Data Mining. http://www.csse.monash.edu/~webb/Files/Webb14.pdf

Pre- and Co-requisite Knowledge
FIT2004 Algorithms and data structures or equivalent


Constraint-based Mission Planning for Fleets of Unmanned Aerial Vehicles
Supervisors: Jan Carlo Barca, Guido Tack and Mark Wallace

Background
The ability to efficiently allocate resources and thereby reduce cost is of a major concern when Unmanned Aerial Vehicles are employed in real world application areas.

Aim and Outline
This project aims to address this issue by developing constraint-based software solutions that enable human operators to determine the number of aerial vehicles required to carry out complex operations, subject to time and fuel constraints. There may also be an opportunity to visualise the mission plans and to perform interactive optimisation of the plan using state-of-the-art visualisation technology available at Caulfield School of Information Technology.

URLs and References
M. Brambilla, E. Ferrante, M. Birattari, and M. Dorigo. (2012) "Swarm robotics: A review from the swarm engineering perspective", Swarm Intelligence, vol. 7, issue 1, pp 1-41. Available: http://iridia.ulb.ac.be/IridiaTrSeries/rev/IridiaTr2012-014r002.pdf

C. Ramirez-Atencia, G. Bello-Orgaz, M.D. R-Moreno, D. Camacho, (2014) "A simple CSP-based model for Unmanned Air Vehicle Mission Planning," Proceedings of 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp.146,153.

Pre- and Co-requisite Knowledge
Advanced programming experience is essential.


Self-organised routing in congested networks
Supervisors: Julian Garcia, Bernd Meyer

BackgroundHow to best route traffic in a large communication network without central control? One potential solution involves systems of multiple agents with independent agendas. In non-cooperative routing games a group of self-interested agents is tasked with routing traffic through a congested network. The task of each agent is to choose a specific route for a share of the resource to be transported. An "equilibrium" plan emerges when all agents are happy with their choice (i.e., no agent has incentives to choose otherwise). The emerging plan provides a solution to the problem [1]. The core of the research in this area focuses on studying the properties of such stable plans by using game theory. This project will depart from that perspective by trying to understand how agents can dynamically adjust their choices to arrive at feasible plans. The approach that we will undertake is inspired in evolutionary game theory [2]. Our starting point is a set of results by Fischer and Vocking [3], who are able to show how convergence times vary for specific networks under a range of simplying assumptions. While these assumptions allow for mathematical tractability, they remove aspects that are important in application-relevant scenarios. Our results aim to provide the foundation for a new approach to manage complex data networks without requiring central control.

Aim and Outline
Our main objective is to check and understand the validity of the results of Fischer and Vocking [3] when some of their assumptions are relaxed. First, we will use large-scale computer simulations and exact numerical results to check if, and when, their results are valid. Our departing model assumes that the set of agents solving the problem is infinite, which is computationally unfeasible. We will also formulate a model that can account for the effect of finite systems. For this purpose we will use recent advances in evolutionary game theory [4] and computer simulations.

URLs and References[1] Roughgarden, Tim, Eva Tardos, and Vijay V. Vazirani. Algorithmic game theory. Vol. 1. Cambridge: Cambridge University Press, 2007.
[2] Weibull, Jorgen W. Evolutionary game theory. MIT press, 1997.
[3] Fischer, Simon, and Berthold Vocking. "On the evolution of selfish routing."Algorithms-ESA 2004. Springer Berlin Heidelberg, 2004. 323-334.
[4] Nowak, Martin A. Evolutionary dynamics. Harvard University Press, 2006.

Pre- and Co-requisite KnowledgeAn interest in evolutionary dynamics, self-organization and multi-agent systems is required. Solid problem-solving and computer programming skills are also necessary. An inclination to maths and analysis of large datasets is advantageous. Python and/or Mathematica (for analytics) is a plus.


Profiling Public Transport Users on Social Media.
Supervisors: Sue Bedingfield, Marc Cheong and Kerri Morgan

BackgroundSocial media is the lens of society. If we want to know about what people think about issues ranging from politics to current affairs, social media provides us with some insights of the views of individuals.

Aim and Outline

The aim of this project is to identify types of Public Transport (PT) users who are active on social media, their views on the PT system and to measure their engagement in this system. In particular, groups of people that are concerned about specific areas of the PT system. The source of information about public transport users will be obtained from social media data from Twitter. A list of seed users will be gathered in two ways: Firstly, identify all official Victorian Public Transport Twitter accounts, then the users who communicate with these accounts. Secondly, identify anyone who has tweeted about public transport in Victoria.

Two types of information will be gathered: the metadata of the users (available from Twitter) and text data from the tweets themselves obtained by text mining using RapidMiner or other software. The types of users commenting on public transport will then be profiled using clustering techniques on each of these domains. Cross-clustering techniques may be used to integrate these results.

This project will be of benefit to stakeholders of the PT system and can be generalised to other areas of public interest.

Pre- and Co-requisite KnowledgePotential students should have skills in data mining and statistics, a basic understanding of the Victorian PT system, and an interest in applying these skills to social media analysis. Interested students, please talk to Kerri, Marc and Sue.


Visualising Data with Bayesian Networks
Supervisors: Kevin Korb, Ann Nicholson and Alan Dorin

Background


Omnigram Explorer is a visualisation tool for interactively exploring relationships between variables using very large data sets. It is designed to help researchers gain a holistic, qualitative understanding of their data, which might highlight relationships that warrant further quantitative investigation. It is also a useful tool to allow non-specialists to explore the behaviour of complex systems. It currently supports Bayesian nets (BNs) as an alternative way of looking at the (causal) relationships between variables, however it does not interact with BNs -- i.e., it only interrogates static data already produced by a Bayesian network or any other source.

Aim and OutlineThis project will connect Omnigram Explorer to a Bayesian network program using its API so that it becomes possible to use Omnigram's visualisation techniques while observing variables and updating the BN in real time. This will allow new ways of interacting with BNs.

URLs and ReferencesTim Taylor (2014). Omnigram Explorer User Documentation. Monash University. Tim Taylor (2014). Omnigram Explorer Technical Documentation. Monash University.

Pre- and Co-requisite KnowledgeKnowledge of or the ability to learn: Java programming, XML, Bayesian net programming, the Processing programming environment.


Biological Complexity
Supervisors: Kevin Korb, Alan Dorin

Background
A long-standing problem of artificial life and theoretical biology is how to understand and measure the complexity of life and evolution. In this project we will use a simple artificial life model of evolution as the means for testing different measures of biocomplexity.

The approach taken by most who are concerned about biodiversity is simply to count the number of species (or families, genera, etc.) present at any given time. Slightly more sophisticated approaches attempt to use information-theoretic measures of complexity. The simplest of these, identifying biocomplexity with entropy, is the most popular, but fails to accord with basic apparent facts of biology, infamously, for example, equating the complexity of living organisms and their dead relatives (McShea, 1993). A minority has applied information theory in more sophisticated ways, including Tononi, Edelman and Sporns (1998) and their followers working on "integrated information theory".

Aim and Outline
We will review the most popular measures of biocomplexity, including those from information theory, and implement them in the context of a simple ALife evolutionary simulation, putting them to the test. The result will be a better understanding of how they relate to each other and what their relative merits and demerits are, which we plan to publish.

URLs and References
Korb, K. B., & Dorin, A. (2011). Evolution unbound: releasing the arrow of complexity. Biology & Philosophy, 26(3), 317-338.

McShea, D. W. (1993). Evolutionary change in the morphological complexity of the mammalian vertebral column. Evolution, 730-740.

Daniel McShea and Robert Brandom (2010). Biology's First Law: The Tendency for Diversity and Complexity to Increase. Univ of Chicago.

Tononi, G., Edelman, G. M., & Sporns, O. (1998). Complexity and coherency: integrating information in the brain. Trends in cognitive sciences, 2(12), 474-484.

Pre- and Co-requisite Knowledge
Ability to program; ability to think analytically.


Promoting Healthy Dietary Habits by Lifelogging with Google Glass
Supervisors: Tim Dwyer; Marc Cheong

Background
Google Glass is a wearable eyeglass computer which provides capabilities such as augmented reality, positional sensors (e.g. location/movement), image capture, and gesture-based interaction. Research in Glass have mainly dealt with applications in mission-critical areas such as assisted surgery, telemedicine, and measuring vital signs.

One potential area of research is the use of Glass in actively lifelogging food, alcohol, and nicotine consumption to promote a healthy lifestyle. There are existing mobile apps (e.g. alcohol calculators, and diet planners) and fitness wearables (e.g. the Fitbit); but none have the ubiquity, and versatility, of the Glass platform. An added bonus that is present with the Glass platform lies in its non-intrusiveness, which allows for user discretion (e.g. measuring and regulating alcohol consumption) at a social event. Proposals on using Glass for fitness and wellbeing have been alluded to / hypothesised on tech blogs, but no concrete implementation has been found thus far.

Aim and Outline
We propose the use of Glass to develop a lifelogging system to promote health and wellbeing by tracking consumption patterns. Features of this system may include, but are not limited to:
- Alcohol consumption patterns: helps the user pace/regulate drinking and promote safe driving habits. The Glass can capture the alcohol labels (standard drinks, % a.b.v., or even serving size), and reminds the user to tap/issue voice commands every time a drink is consumed. This logs alcohol consumption per session (helping the user measure alcohol intake for safe driving); and also monitor long term consumption patterns.
- Food and Calorie tracking: by capturing food/nutritional labels, barcodes, (or time permitting - image recognition), this helps the user calculate and track nutritional intake on a daily basis (e.g. avg. caloric intake, % of RDA etc).
- Nicotine consumption patterns: helps the user track the amount of nicotine consumed (e.g. via patches, or cigarettes) on a daily basis, useful for a nicotine user who is thinking of quitting.

URLs and References
Google Glass Developers: https://developers.google.com/glass/
Papers on augmented reality: e.g. Azuma et al (2001), http://ieeexplore.ieee.org/document/963459/

Pre- and Co-requisite Knowledge
- Human-Computer Interaction (specifically augmented reality)
- Mobile programming using the Android API
- Basic image processing (specifically OCR) techniques


Autonomous Robotic Indoor Drone (ARID)
Supervisors: Richard Cox and Jon McCormack

Background
UAVs (Unmanned autonomous Aerial Vehicle), commonly called 'Drones' are becoming popular devices for experimental robotics. Currently, drones normally require a human operator, and their level of "intelligence" or autonomy is very limited. Recent developments in low-cost computing and sensor technology means we can experiment with drones that have a far greater autonomy, such as the ability to find a person or location, avoid obstacles or "return to home" when batteries are low.

Aim and Outline
In this aerial robotics project you will build a UAV that will fly indoors. The aircraft is a helium balloon that uses an ultrasound sensor to detect altitude and thrust vectoring of propellors to maintain a constant altitude. The balloon navigates around an infra-red beacon using infra-red sensors mounted at the front, back, left and right. Turns are effected by differentially powering one or both of the blimp's two electric motors.

The project is described here:

http://diydrones.com/profiles/blog/show?id=705844%3ABlogPost%3A44817

One of the challenges of the project is to control the drone using the Raspberry Pi (a low-cost, credit card-sized computer on a single board). You will need to convert the Arduino design so that it works on the Pi. This presents a design challenge as the Drone can only lift a relatively small weight, so must be able to lift any sensors, computers, power, etc.

Extensions to the project could involve researching an alternative indoor navigation technology, perhaps a wi-fi spatial location sensing system. When the UAV is working it can become a platform for a camera (and could, for example, recognise faces or other environmental features) or could be a surface onto which designs might be projected.

URLs and References
http://diydrones.com/profiles/blog/show?id=705844:BlogPost:44817 http://raspberrypi.org

Pre- and Co-requisite Knowledge
You will need skills in electronic assembly - soldering and breadboard prototyping, together with some knowlege of the Processing programming language. The project will be based in sensiLab at Caulfield Campus, where students will have access to all necessary hardware for the project.


Shareholder Value of IT Investments
Supervisors: Vincent Lee and Yen Cheung, and ANZ supervisor (to be advised)

Background
Preferably to commence from semester 2, 2015 through to end of semester 1, 2016.

Aim and Outline
Define ways to measure contribution of IT spend to shareholder value, a methodology or model that can be applied, and a theorem and heuristics that executives can leverage in making IT investment decisions.

Context

  • Each year $1B+ of proposed IT investments are planned for the Bank
  • Alignment to business strategy and technology strategy is measured
  • Business benefits are put forward either by estimating the CTI value or based on an imperative kind of assertion
  • Usually there is an element of 'gut feel' in which some initiatives make the cut and some do not in an executive round-table.
  • This then becomes the official list of IT initiatives which then are tasked with delivering value

Approach
Address the research questions:

  • Technology benefits of IT investments (such as reuse, simplification, agility, layering, disruption, transformative potential) are not easily measured. There is an absence of literature on the subject especially quantitative/economic based.
  • Each $ of IT investment should not only provide business benefits but improve the digital assets of the Bank which in turn would improve shareholder value/economic returns
  • Is it possible to measure IT spend contribution not only in terms of business benefit (CTI) but also in terms of improvement to the technology assets themselves as a contributor to shareholder value?
  • That is we may get business benefits but if that initiative detracts from the enterprise value of technology as an asset then we should price that in the negative, deducting from the net business benefits.
  • Conversely if we improve the value of technology as an enterprise asset then that should be priced positively.

How the deliverable will be used?

  • Making the quantitative connection between IT investment and shareholder will help the Bank prioritise annual IT spend in addition to strategic alignment and business benefits (the GAP)
  • Adding a delta price of initiatives which have a positive or deleterious effect on the enterprise value of technology will help in prioritisation
  • Understanding what the next step might be in making the IT investment process more robust and scientific.

URLs and References
www.anz.com.au; semester 2, 2014 reading lists for FIT3051 and FIT5159 https://moodle.vle.monash.edu.au

Pre- and Co-requisite Knowledge
Student should have studied FIT3051 Decision support system for finance; and/or FIT5159 IT for financial decisions


Text Analysis of Contract Documents
Supervisors: Mark Carman

Background
ANZ enters complex legal agreements with our customers for loans and other financial products. These are captured in multi-page paper-based contract documents, the terms and conditions of which are often tailored to an individual customer's needs. ANZ is seeking to structure this data so that insights can be incorporated into ANZ's risk models.

Aim and Outline
Identify techniques for extracting and categorising contract terms from scanned documents

Pre- and Co-requisite Knowledge
Good understanding of maths/statistics. Experience in Machine Learning would be useful.


Clustering of DNA sequences to identify DNA-binding sites
Supervisors: Peter Tischer , Mirana Ramialison (Australian Regenerative Medical Institute, or ARMI), David Dowe

Background
DNA-binding proteins (transcription factors, a.k.a. master gene regulators) are genes that control the formation of our organs. They instruct each cell to become a specific organ by binding to specific DNA sequences.

Aim and Outline
We use clustering (and, more generally, mixture modelling) to cluster and identify the binding sites of these genes in order to decrypt the DNA code that determines the identity of a cell (e.g: heart cell, liver cell, …, etc.). Ultimately, this will give insight into understanding how regeneration process occurs (e.g.: how animals such as the zebrafish can regrow organs but humans can't).

The data consists of segments of DNA bases (A, C, G, T) that contain the binding site of transcription factors in an unknown location from different tissues and species (plant, mouse, human, etc.).There is an abundance (100s of Megabytes) of relevant public data and in-house ARMI data.

We will initially analyse this with k-means clustering and then introduce Minimum Message Length (MML) and Snob.

Snob uses MML to infer a model which specifies the number of clusters (or components), the statistical parameters (e.g., mean and standard deviation) of each cluster, the size (more specifically, the relative abundance or mixing proportion) of each class, and the assignment of parts of the data into respective classes.

DNA bases might be analysed 1 base at a time or in pairs of bases, etc.

URLs and References C. S. Wallace (2005), "Statistical and inductive inference by minimum message length'', Springer.

Wallace, C.S. and D.L. Dowe (2000). MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions, Statistics and Computing, Vol. 10, No. 1, Jan. 2000, pp73-83.

Haudry Y.*, Ramialison M.*, Paten B., Wittbrodt J., Ettwiller L. (2010) Using Trawler_standalone to discover overrepresented motifs in DNA and RNA sequences derived from various experiments including chromatin immunoprecipitation. (*=Co-first authors). Nature Protocols. 2010;5(2):323-34.

Pre- and Co-requisite Knowledge

Essential: Mathematics to at least 1st year university level, interest in bioinformatics.

Desired: Knowledge of bioinformatics, interest in regenerative medicine.


Wearable application with a remote sensor and actuator network
Supervisors: Manuela Jungmann, Richard Cox and Jon McCormack

Background
Wearable sensor applications designed in support of well-being are gaining in popularity. When measuring a person’s physical health these applications use sensor technology to capture the body’s vital signs (e.g. heart rate, breathing etc). In the event of a clinician using the wearable to monitor a person’s physiology, the generated data is typically sent to a remote location where it is stored for evaluation. However, the data could be processed further where a wearable application includes actuator technology to assist the person in managing their own state of well-being.

Aim and Outline
In this applied human physiology project, you will create a network organized around a group of people using sensors, actuators and Raspberry Pi II single-board computers. The project will require the design of a personal area network (line of sight transmission via e.g. zigBee, Bluetooth etc) that includes the sensors, actuators and also hand-held controllers for haptic feedback (e.g. LRA vibration motor).

The sensor included might be a pulse sensor to measure the wearer’s heart-rate variability derived from the raw pulse sensor data. The actuators of this project are a micro servo and a Peltier cooling module.

The indexed sensor data is sent to the hand-held controller to provide tactile feedback and enable the person holding the controller to operate the actuators of the wearable.

You will create the high-level communication protocol for the PAN, set up the sensor/actuators, and design and implement the transmission system as well as the calibration protocol.

URLs and References
https://www.raspberrypi.org/products/raspberry-pi-2-model-b/
https://www.precisionmicrodrives.com/product-catalogue/linear-resonant-actuator
https://en.wikipedia.org/wiki/ZigBee
http://pulsesensor.com/
http://tetech.com/peltier-thermoelectric-cooler-modules/
http://hitecrcd.com/products/servos/micro-and-mini-servos/digital-micro-and-mini-servos/hs-5085mg-premium-metal-gear-micro-servo/product

Pre- and Co-requisite Knowledge
You will need skills in electronic assembly, soldering, breadboarding digital/analogue conversion, good programming skills in Python and have a working knowledge of network technologies. The project will be developed in sensiLab at Caulfield Campus. You will be supplied with all necessary hardware for the project.


Big Data in Education: Learning Analytics using Data Mining
Supervisors: Chris Messom

Background
Big Data and the underlying technologies, (MapReduce, Hadoop, Spark etc) are revolutionising business analytics in both commercial and government sectors. Learning analytics are the data mining techniques used to support large scale learning support through learning management systems such as Moodle.

Aim and Outline
To review current state of the art in Learning Analytic systems and identify an area that would benefit from supervised and unsupervised classification data mining techniques.

Identify relevant research questions to be answered by the study.

Implement a prototype Big Data learning analytic system that interfaces to a learning management system (such as Moodle), to address the research questions.

URLs and References
Data Mining in Education: http://onlinelibrary.wiley.com.ezproxy.lib.monash.edu.au/doi/10.1002/widm.1075/full
Data Mining Tools: http://onlinelibrary.wiley.com.ezproxy.lib.monash.edu.au/doi/10.1002/widm.24/full
Data Mining: Practical Machine Learning Tools and Techniques (Third Edition): http://www.sciencedirect.com.ezproxy.lib.monash.edu.au/science/book/9780123748560
https://spark.apache.org/
https://hadoop.apache.org/
https://en.wikipedia.org/wiki/MapReduce
https://cloud.sagemath.com/ and https://cloud.sagemath.com/help

Pre- and Co-requisite Knowledge
Some or all of: Java, linux/unix, WEKA data mining tools, Hadoop, MapReduce, Spark, Moodle and/or Sage Maths tools.


Inference of ecological species distribution models
Supervisors: David Dowe and Prof. Lewi Stone (RMIT)

Background
Identifying how species are distributed over the landscape, interact and self-organize into foodwebs are central goals in Ecology. Species Distribution Models, or SDMs, have become one of the fastest moving and top ranked research fields in the ecological and environmental sciences.

Aim and Outline
This project will attempt to derive innovative statistical tools to improve our understanding of species distributions. These models predict the spatial distribution of all individuals of a particular species within its potential geographic range. The models are generally fitted to observed spatial survey data of a single species together with local measurements of environmental or geographical conditions that might potentially influence species’ occurrence or location (e.g., temperature, rainfall or elevation). Predictions of a species’ spatial distribution may then be computed under different environmental scenarios, such as modifying the SDM's environmental parameters to reflect hypothetical climate or land use changes. Having the ability to predict the likely locations of a species under different environmental scenarios is important for a wide range of conservation management and policy contexts, including the management of threatened species, assessing the impact of development scenarios, determining biodiversity “hotspots,” and predicting the likely ranges of invasive species. Very few models proposed for the analysis of these data-sets account for the effects of errors in detection of individuals, even though nearly all surveys of natural populations are prone to detection errors, which can be significant. Failure to account for imperfect detectability in models can induce bias in the parameters and predictions. This is an exciting challenge for which solutions are sought.

In this project we will be investigating and developing new statistical techniques for dealing with these problems. Minimum Message Length (MML) has the potential to revolutionise current techniques.

URLs and References
C. S. Wallace (2005), "Statistical and inductive inference by minimum message length'', Springer.

D. L. Dowe (2011), "MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness", Handbook of the Philosophy of Science - (HPS Volume 7) Philosophy of Statistics, P.S. Bandyopadhyay and M.R. Forster (eds.), Elsevier, [ISBN: 978-0-444-51862-0 {ISBN 10: 0-444-51542-9 / ISBN 13: 978-0-444-51862-0}], pp901-982, 1/June/2011.

Pre- and Co-requisite Knowledge
Essential: Mathematics to at least 1st year university level.
Desired: Knowledge of or interest in ecology.


Weighted Support Vector Machines
Supervisors: Peter Tischer and David Albrecht

Background
In machine learning, Support Vector Machines (SVMs) are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. A supervised classification scheme is given a set of attributes and labels and tries to come up with an approximating function which, when given the inputs, tries to return the appropriate label. For example, the data might be a set of test results for a particular subject and the label might take 2 possible values: cancer or not-cancer.

A couple of important problems when developing supervised learning models are imbalanced classes and 'recency' of data. In the class imbalance problem, one class may have very many more representatives than the other. For instance, in medical data there may be many instances of people who are healthy and relatively few instances of people who have a particular disease. If 99% of people are healthy, then a classifier which always gives the label, "The person is healthy", will be right 99% of the time.

When consider 'recency' of data, the most up-to-date data is given the greatest importance and the weight of data should decline over time.

Aim and Outline
The aim of this project is to explore different possible schemes for weighting the training data instances and to investigate the effects they have on classifier performance, both in terms of accuracy and efficiency.

In particular, the project should investigate how to weight instances to take into account the 'recency' of data and how weights can be used to deal with the class imbalance problem.

URLs and References
http://en.wikipedia.org/wiki/Support_vector_machine


Swarming and Robustness
Supervisors: Jan Carlo Barca

Background
Robustness is one of the key characteristics of swarms, but what are the factors that underpin this highly attractive quality?.

Aim and Outline
This project aims to address this question by formulating mechanisms that can be used to evaluate the robustness of swarms under varying communication and control topologies. The student will work within Monash Swarm Robotics Laboratory and attend weekly meetings with researchers & students in the lab. This is a great opportunity for the selected student to learn about swarm robotics and work within a multi-disciplinary team consisting of software, mechanical and electrical engineers.

URLs and References
M. Brambilla, E. Ferrante, M. Birattari, and M. Dorigo. (2012) "Swarm robotics: A review from the swarm engineering perspective", Swarm Intelligence, vol. 7, issue 1, pp 1-41. Available: http://iridia.ulb.ac.be/IridiaTrSeries/rev/IridiaTr2012-014r002.pdf

C. Ramirez-Atencia, G. Bello-Orgaz, M.D. R-Moreno, D. Camacho, (2014) "A simple CSP-based model for Unmanned Air Vehicle Mission Planning," Proceedings of 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp.146,153. Available: https://dl.dropboxusercontent.com/u/27121329/Mission%20Planning.pdf

Pre- and Co-requisite Knowledge
Advanced programming experience is essential.


Security vulnerabilities of Bitcoin and the mitigation
Supervisors: Joseph Liu and Ron Steinfeld

Background
Bitcoin is a payment system invented in 2008. The system is peer-to-peer such that users can transact directly without needing an intermediary. Despite its advantages, security is a big concern for such a digital currency.

Aim and Outline
Although some potential attacks on the Bitcoin network and its use as a payment system, real or theoretical, have been identified by researchers, there exists more vulnerabilities yet to be discovered. The aim of this project is to identify some potential security vulnerabilities of Bitcoin and propose the corresponding mitigation.

URLs and References
[1] S. Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2008. https://bitcoin.org/bitcoin.pdf

[2] Marcin Andrychowicz, Stefan Dziembowski, Daniel Malinowski, and Lukasz Mazurek. Secure multiparty computations on Bitcoin. In Security and Privacy (SP), 2014 IEEE Symposium on Security and Privacy, May 2014.

Pre- and Co-requisite Knowledge
Familiarity with the basics of digital currency would be an advantage.


BN modelling methods; Java or similar OOBN with application in Food Security
Supervisors: Ann Nicholson, Kevin Korb & David Albrecht

Background
Bayesian networks are amongst the most flexible and usable new technologies for modelling systems under uncertainty. Moreover, the applications of these models is flourishing in the applied sciences, finding many important new uses each year. However, many of these applications  would benefit from support for  larger models, especially those spanning large geotemporal domains, as in food security.

One approach for enabling the use of Bayesian Network to be applied to solving large-scaled modelling problem is to utilise an Objected-Oriented approach.

Aim and Outline
The aim of this project will be to define and demonstrate the effectiveness of Object-Oriented Bayesian Networks by developing OO-type models for food security.

URLs and References
http://www2.warwick.ac.uk/research/priorities/foodsecurity

Pre- and Co-requisite Knowledge
BN modeling methods; Java or similar


Eco Campus App Database
Supervisors: David Taniar, David Albrecht, Nancy Van Nieuwenhove (Biology), Gerry Rayner (Biology)

Background
An Eco Campus App is being developed that will enable Biology students to identify and locate species on the Clayton Campus. These students will be required to report on their sightings of the species. These reports could include taking pictures, recording video, and  making notes. The collection of  these reports will then form the basis of an e-portfolio which will be used as part of the student's assessment.

Aim and Outline
In this project we will investigate the various issues associated with the Eco Campus App database and the e-portfolios.

For example, students may not be able to locate a particular species. This may be due to a number of factors, i.e., incorrect information in the database or not being able to correctly identifying the species. This raises questions on how to maintain the integrity of the database. There are also questions  regarding what is the most  appropriate approach for storing the e-portfolios and how to utilise the information in the e-portfolios to update the Eco Campus App database.


Efficiency and effectiveness of Incremental Mutation Analysis
Supervisors: Robert Merkel

Background
Mutation testing is a well-established technique for assessing the quality of the test suite of a program. It works by automatically creating mutant versions of the software with changes such as changing operators, variable names, and the like, in such a way as the program  will still compile. The test  suites are then applied to the mutant versions, and the proportion of mutants which trigger a failure in the test suite is recorded. The higher the proportion - the mutation score - of the test suite, the better the test suite is presumed to be. Experiments  have found that mutation score  is a very good predictor of the ability of a test suite to detect real faults. However, the process of creating, compiling, and running the entire test suite on very large sets of mutant versions of software requires a lot of time.

In modern  software development , test suite  quality is often measured on an ongoing basis, as part of a process known as continuous integration. An automated test suite is used many times per day, as developers use it to check the correctness of their changes as they are added to the project's  version control repository. In  this circumstance, any test suite quality assessment must provide feedback to developers quickly if it is to be useful. Mutation analysis is currently too slow for use on even moderately-large projects in the context of continuous integration.

A previous honours project showed that mutation analysis could be effectively parallelized on distributed computing clusters, but the financial costs of conducting mutation analysis on a cluster can be high if conducted regularly.

PiT, a mutation analysis tool for Java, has some support for incremental analysis, in which the results previous execution of mutation analysis on one version of a software system are used to minimize the amount of new work required to analyze a subsequent version. The creator of PiT has proposed several  additional optimizations which are not  yet implemented. However, no assessment of the effect of these optimizations on the speed and accuracy of mutation analysis has been reported.

Aim and Outline
In this project, we will collect empirical evidence about the performance improvements, and effects on mutation score accuracy, of incremental mutation analysis.

The research will involve modifying an existing open source mutation analysis tool to collect additional data about internal operations, implementing (only to proof-of-concept standard) some or all of the proposed optimizations, and using the version control repository of existing open source software  projects to collect data to  measure their effectiveness.

URLs and References
Yue Jia; Harman, M., "An Analysis and Survey of the Development of Mutation Testing," Software Engineering, IEEE Transactions on , vol.37, no.5, pp.649,678, Sept.-Oct. 2011

MutPy Python mutation analysis tool: https://pypi.python.org/pypi/MutPy/0.4.0

PiT mutation analysis Tool for Java: http://pitest.org

Coles, H. Incremental analysis (PiT). http://pitest.org/quickstart/incremental_analysis/

Pre- and Co-requisite Knowledge
Useful skills for this project include:

  • working knowledge of Unix programming
  • Experience with cluster computing
  • Familiarity with version control systems.
  • Familiarity with unit testing frameworks such as JUnit or PyUnit.
  • Understanding of basic descriptive or inferential statistics

None of these are essential, but the more familarity you have with these topics, the easier initial progress is likely to be.


Machine learning algorithms in “face icon maker” system for semantic and sentiment analysis of social network data
Supervisors: Associate Professor Vincent Lee & Dr Yen Cheung

Background
MIT Computer Science or Bachelor of Software Engineering

Aim and Outline
Social network such as Facebook and Twitter are known for its convenient and massive short text propagation. When users interact with each other (post, comment, or @someone), they attempt to make the text more attractive to express their feelings. A face icon system  is the platform that can  launch feelings. However, the tradition face icon uses non-adaptive algorithms for users to configure their preference, they have to search, post or installed in the system, which beside inconveniences, tedious and lack of desired accuracy. This project  aims to develop machine  learning algorithms that explore and exploit structured and semi-structure text data from social networks for semantic and sentiment analyses, which can be used by enterprise decision makers for improving product design and service quality.

URLs and References
https://css-tricks.com/icon-fonts-vs

SPECIAL ISSUE PAPER
Active learning in keyword search-based data integration
Zhepeng Yan · Nan Zheng · Zachary G. Ives ·Partha Pratim  Talukdar · Cong Yu
The VLDB Journal (2015) 24:611–631
DOI 10.1007/s00778-014-0374-x

Pre- and Co-requisite Knowledge
Text analytic mining, simulation software tool


Is cyber-crime operation cost predictable?
Supervisors: Associate Professor Vincent Lee & Dr Jianneng Cao (I2R, Singapore)

Background

MBIS or MIT Computer Science or Bachelor of Software Engineering (Hons)

Aim and Outline:
Body of practice-based literature has proposed cyber-crime estimated cost based generally on annual loss equivalent to justify for cyber-crime preventive budget for hard- and software purchase/development. Recent advance in big data analytic tools provide new insights  on the predictive capability of  real time cyber-crime operational cost. This project explores how to predict enterprise specific cyber-crime cost via big data analytic tools that exploit the descriptive, predictive and prescriptive analytics of cyber-crime data.

URLs and References
Brian Cashell, Willian D. Jackson, Mark Jickling, and Baird Webel (2014), “ The Economic Impact of Cyber-Attacks”, CRS Report for Congress, received through the CRS Web.

Howard E. Glavin (2003), “A Risk Modelling Methodology,” Computer Security Journal, vol. 19, no. 3 (Summer),pp.1-2; and Soo Hoo, How Much Is Enough? pp.4-12.

Ben Fischer (2014), “How to Calculate the Financial Impact of an Attack on your Network,” Arbor Networks Cyber-Security Summits (Oct 2014).

Arbor Networks white paper (2014), “The Risk vs. Cost of Enterprise DDoS Protection -How to Calculate the ROI from a DDoS Defense Solution,” 12 pages

Ponemon Institute Research Report (2014), “2014 Cost of Data Breach Study-Global Analysis”, May, Benchmark research sponsored by IBM

Pre- and Co-requisite Knowledge
Simulation software tool (e.g. Java Script, JADE or MATLAB)


Evolving Swarm Behaviour (2 students)
Supervisors: Dr Jan Carlo Barca and Dr Julian Garcia Gallego

Background
The ability to adapt to changing circumstances is critical to the survival of groups in nature. Particularly in competitive environments.

Aim and Outline
This project will investigate the underlying mechanisms that facilitate such behavioural plasicity via evolutionary techiques. The aim is to formulate dynamics, which is transferrable to swarms of robots. This work will be carried out within Monash Swarm Robotics  Laboratory and the student will  be given access to state-of-the-art simulation tools available in the lab.

This is a great opportunity for the selected student to work within a multi disciplinary team consisting of computer scientists, mechanical, electrical, mechatronics and aeronautical engineers.

URLs and References
M. Brambilla, E. Ferrante, M. Birattari, and M. Dorigo. (2012) "Swarm robotics: A review from the swarm engineering perspective", Swarm Intelligence, vol. 7, issue 1, pp 1-41. Available: http://iridia.ulb.ac.be/IridiaTrSeries/rev/IridiaTr2012-014r002.pdf

http://www.infotech.monash.edu.au/srlab/

Pre- and Co-requisite Knowledge
Advanced programming experience is essential and a desire to progress into a PhD program is preferred.



Crowd-sourcing and mining dilemmas: dishonest behaviour in online pool games
Supervisors: Julian Garcia

Background
Modern digital currencies, like Bitcoin, rely on miners who perform computational work to keep the system secure [1]. Their work is to confirm transactions, which they do by competing with each other in solving ``hash puzzles''. Participants of such systems naturally  form pools, where members aggregate  their power and share the rewards. It has been shown that miners can attack pools by withholding rewards. The pool under attack will share their rewards with attackers, reducing the earnings of honest participants. For two pools, the decision  whether or not to  attack is known as the miner’s dilemma, an instance of a well-known game called the prisoner’s dilemma [2]. The analysis shows that "rational" pools will always attack other pools. This analysis relies on assuming that pools are themselves agents.  In reality, open large  pools do not make decisions as such, but their behaviour arises from aggregate individual decisions. This is an important aspect that has been overlooked in game theoretical analyses of this problem.

Aim and Outline
This project aims to use a computational model to inspect if and when mining dilemmas can arise from individual behaviours. We will assume that players are not fully rational, but learn to adjust their behaviour using simple rules. This approach is known as evolutionary  game theory. The problem at  hand is general. It is not only applicable to distributed systems, but also arises in crowdsourcing contests [2] and other online applications [3].

URLs and References
[1]S. Nakamoto, Bitcoin: A peer-to-peer electronic cash system. .
[2]I. Eyal, “The Miner’s Dilemma,” arXiv:1411.7099 [cs], Nov. 2014.
[3]V. Naroditskiy, N. R. Jennings, P. V. Hentenryck, and M. Cebrian, “Crowdsourcing contest  dilemma,” Journal of The Royal  Society Interface, vol. 11, no. 99, p. 20140532, Oct. 2014.
[4]L. Luu, R. Saha, I. Parameshwaran, P. Saxena, and A. Hobor, “On Power Splitting Games in Distributed Computation: The Case of Bitcoin Pooled Mining,” 155, 2015.

Pre- and Co-requisite Knowledge
An interest in game theory, computational models and simulation. An inclination to maths and analysis of large datasets is advantageous. Solid knowledge Python and/or C++ is a plus.


Visualizing Dynamic Bayesian Networks (DBNs)
Supervisors: Ann Nicholson, David Albrecht & Kevin Korb

Background
Bayesian network tools provide GUIs with very similar means of visualizing Bayesian nets in ways that are useful to technologists who need to build, test and operate Bayesian networks. Non-specialists generally have great difficulty in understanding these tools and dealing  with the GUIs. These difficulties  are considerably greater when the added complexity of dynamic BNs needs to be dealt with. As situations or scenarios develop over time, understanding how related factors are affected can be extremely difficult. In the domain of fog forecasting  we have developed a  specialized tool using D3 for graphically portraying how the probability of fog changes over time in Boneh, Zhang, Nicholson and Korb (2015). This project will carry forward that work.

Aim and Outline
This project will refine the existing D3 DBN visualization tool, evaluating the visualization with fog forecasters and others, using the feedback to improve the visualization. The project results will be submitted for publication in an appropriate venue.

URLs and References
Boneh, Zhang, Nicholson and Korb (2015). A Tool for Visualising the Output of a DBN for Fog Forecasting. ABNMS 2015.

Pre- and Co-requisite Knowledge
Knowledge of or the ability to learn: Java programming, D3.


Security for the Internet of Things (IoT)
Supervisors: Ron Steinfeld and Joseph Liu and Carsten Rudolph

Background:
The rapidly increasing number of devices connected to the Internet, especially small devices such as cameras, sensors, and actuators, making up the so-called Internet of Things (IoT), appears to be one of the big trends in computing for the near future. As such devices  are increasingly used to collect  potentially private data, as well as control critical infrastructure, the privacy and integrity security of IoT is becoming a highly important concern. Yet the massive scale of the emerging IoT, its highly distributed nature, and the low computational  abilities of many IoT  devices, pose new challenges in attempting to devise practical solutions for IoT security problems.

Aim and Outline
The goal of this project is to explore, implement and evaluate the practicality of protocols for securing the privacy and/or integrity of large scale, highly distributed IoT networks of low-power devices.

Examples of project topics include:

  • Authentication protocols to enforce access control to IoT devices only to authorized users.
  • Encryption protocols to provide privacy for IoT sensor data (e.g. for sending over the Internet to a cloud-based encrypted database).

Practical Implementation/evaluation -oriented projects will likely involve evaluating the secure protocol implementations on sample embedded hardware devices incorporating sensors, in collaboration with the Monash Dept. of Electrical and Computer Systems Engineering.

URLs and References:
[1] http://spectrum.ieee.org/telecom/security/how-to-build-a-safer-internet-of-things

Pre- and Co-requisite Knowledge
Depending on the nature of the project topic selected, the student should have either (1) Good programming skills and/or (2) good mathematical skills, and preferably both. Familiarity with the basics of cryptography would be an advantage.


Investigation of Cryptographic Code Obfuscation
Supervisors: Ron Steinfeld

Background
Code obfuscation is the process of `hiding' the implementation of a program while preserving its functionality, which has potential applications in software IP protection, as well as a myriad of other cryptographic applications, such as efficient Broadcast Encryption.  Traditionally done by heuristic methods,  it was only recently [1] that plausible cryptographic methods of program obfuscation were proposed, but their secure cryptographic construction is still a major research problem.

Aim and Outline
The aim of this project is to evaluate the practicality of and explore improvements to some of the new theoretical code obfuscation methods related to sound security foundations, in particular the construction in [2]. Depending on student interest and capabilities,  the specific goals of this  project would be to evaluate the concrete memory and time requirements of these mechanisms, namely:

  1. Memory Efficiency: Estimate concrete parameter sizes for the systems in [2] required to achieve a desired security /correctness level, based on the best known attacks on these systems, and known models for behaviour of those attacks.
  2. Computational Efficiency: Evaluate the practical computational cost of the mechanism by implementing a prototype of the mechanism using efficient algorithms for the underlying mathematical computations (based on existing specialised arithmetic libraries), and evaluating its performance.

This is an opportunity for talented students to investigate state of the art cryptographic algorithms.

URLs and References
[1] S. Garg et al. Candidate Indistinguishability Obfuscation
and Functional Encryption for all circuits. Available at https://eprint.iacr.org/2013/451.pdf

[2] Z. Brakerski et al. Obfuscating Conjunctions under Entropic Ring LWE. In Proceeding
ITCS '16 Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science
Pages 147-156, ACM New York, NY, USA, 2016.

Pre- and Co-requisite Knowledge

Familiarity with the basics of cryptography would be an advantage. The student should have good mathematical and programming skills.


Encrypted Database System
Supervisor: Joseph Liu, Ron Steinfeld and David Taniar

Background
The convenience of outsourcing has led to a massive boom in cloud computing. However, this has been accompanied by a rise in hacking incidents exposing massive amounts of private information. Encrypted databases are a potential solution to this problem, in which the database is stored  on the cloud server in encrypted form, using a secret encryption key known only to the client (database owner), but not to the cloud server. However, existing encrypted database systems either are not secure enough, or suffer from various functionality and efficiency overhead limitations when compared  to unencrypted database, which can limit their practicality in various applications.

Aim and Outline
The goal of this project is to explore, develop and evaluate improvements to a selected functionality and/or efficiency aspect of existing encrypted database systems, with the aim of improving their practicality. Examples include:
*Efficient Implementation of encrypted database using standard  distributed computing frameworks like Apache Hive and/or NoSQL systems.
*Ranking Search Results: Current searchable encrypted database schemes do not support such a ranking functionality at the server. The goal is to investigate the feasibility of adding such functionality, while preserving a good  level of privacy against the server.

URLs and References
-Cash, D., Jarecki, S., Jutla, C., Krawczyk, H., Ro¸su, M.-C. and Steiner, M. Highly-scalable searchable symmetric encryption with support for boolean queries, Advances in Cryptology–CRYPTO 2013, Springer, pp. 353–373. Available online at https://eprint.iacr.org/2013/169.pdf

Pre- and Co-requisite Knowledge
The student should have (1) Good programming skills and/or (2) Familiarity with the basics of cryptography and distributed computing environments (such as Hadoop, Hive, HBase).

Previously Offered: Yes (previously offered title: Searchable Encrypted Databases) We have updated the title and the content.


Dynamic Descriptive Interfaces: Visualising Content in Context
Supervisors: Joanne Evans and potentially more

Background
Records relating to children who spent time in institutional and other out-of-home care are spread across archival institutions and often appear as undocumented, invisible and inaccessible from a community-centred perspective. The Find and Connect Web Resource (http://www.findandconnect.gov.au/)  goes some way towards addressing  the challenges in their discovery, accessibility and interpretation. It documents the complex contextual network surrounding these records – the homes which created them, the organisations that now manage them, the legislative context which  can help to inform  their interpretation and the ways in which Care Leavers have made use of them in telling the story of their experiences.

However navigating this complex network by traditional keyword search mechanisms is not easy. It requires a high degree of text and search literacy and familiarity with the archival context in which the records are held. Those managing the resource are keen to explore the ways in which visualisation  techniques may be used to  enhance usability so as to aid the discovery and accessibility of these records, as well as improving the ways in which entities and relationships are added to the database.


Aim and Outline

This project aims to explore the ways in which the complex contextual network surrounding records relating to children who have been in institutional and other out-of-home care to aid in data entry, discovery and accessibility. It will involve investigating  how faceted searching may be  combined with visualization techniques to improve usability, as well as developing a set of accessible design principles to facilitate information equity and access. It sits at the intersection of recordkeeping metadata, visualisation, interaction and  user experience design research.

URLs and References
Find and Connect Web Resource - http://www.findandconnect.gov.au/

McCarthy, G. J., & Evans, J. (2012). Principles for Archival Information Services in the Public Domain. Archives and Manuscripts, 40(1), 54–67. http://doi.org/10.1080/01576895.2012.670872

Whitelaw, M. (2012). Towards generous interfaces for archival collections. Comma, 2012(2), 123–132. http://doi.org/10.3828/comma.2012.2.13

Pre- and Co-requisite Knowledge
Some experience with HTML5 and Javascript necessary. Find and Connect’s faceted search is built on Solr/Lucene using Encoded Archival Context (EAC-Cpf) – so an interest in working with these technologies and in the archival and recordkeeping  space essential. Experience and passion  for novel and accessible interface design is also highly desirable.


Interactive visualisation of aerospace data using Virtual Reality displays
Supervisors: Maxime Cordeil, Tobias Czauderna (FIT), Callum Atkison (Eng)

Background
Aerospace data visualisation of 3D vector fields.

Aim and Outline
The project is focused on identifying and developing tools and an efficient workflow that will allow for the visualisation of the 3D velocity fields of large (8193 x 1000 x 1362 grid points and 340 Gb per time step) direct numerical simulations of the complex turbulent  flows in  the study of aerodynamics and aerospace engineering. The aim of the project is to establish a system by which we can efficiently render various aspect of this dataset and explore the relationships between different features and structures and their interaction in this  complex flow.  Typically, isosurfaces visualisations (see attached images) and animations of the 3D velocity fields are used to understand this data. This project will involve bringing this kind of visualisation to Virtual Reality displays such as the Oculus Rift and/or the CAVE2,  a large  virtual reality room with 80 high definition screen and tracking system.

URLs and References
https://ltrac.eng.monash.edu.au/

https://www.youtube.com/watch?v=GW2LRo2ZigQ&feature=iv&src_vid=10ZCn6KCRYs&annotation_id=annotation_652212

Pre- and Co-requisite Knowledge
3D programming


A Community Centred Data Aggregation Model
Supervisors: Vincent Lee, Yen Cheung, Chan Cheah

Background
Ratepayers Victoria (RV) is an incorporated association, purposed to advocate for Victorian ratepayers in matters of local government. Its mission is to assure and ensure good governance and compliance prevails in all council affairs. It also plays a role in developing  and implementing state wide systematic  reforms to ensure councils:

  1. Are financially responsible and accountable to their ratepayers
  2. Demonstrate open government and good governance in both local government and state government
  3. Is socially and environmentally responsible in municipal service delivery and management.

To date, RV was part of the Fair Go Rates committee, and is now part of the Local Government Performance Reporting Framework (LGPRF) committee, developing better KPI metrics for Local Government.

RV also facilitates ratepayer advocates to better leverage technology to support their reform contributing activities. Therefore, it also plays important roles in forming strategic partnerships; ICT enabled governance tools, including future Local Government big data analytics data capture and reporting  capabilities

Aim and Outline
As a means to managing council complaints and compliments, this project’s goal is to design and develop a data aggregation model that provides a community centred and traceable approach in registering to resolving council complaints across different escalating  authorities. This model should also  incorporate useful analytics reporting to the different stakeholders.

URLs and References:

  1. Council & complaints - A report on Current Practices and Issues (Feb 2015)
  2. Complaint and complements management resources
  3. The LG Act - to track which sections of the law a complaint or complement may breach or follow compliance.

Pre- and Co-requisite Knowledge
Prefer BIS majors otherwise none


A Community Centred Governance Model
Supervisors: Vincent Lee, Yen Cheung, Chan Cheah

Background
Ratepayers Victoria (RV) is an incorporated association, purposed to advocate for Victorian ratepayers in matters of local government. Its mission is to assure and ensure good governance and compliance prevails in all council affairs. It also plays a role in developing  and implementing state wide systematic  reforms to ensure councils:

  1. Are financially responsible and accountable to their ratepayers
  2. Demonstrate open government and good governance in both local government(LG) and state government (SG) contexts.
  3. Is socially and environmentally responsible in municipal service delivery and management.

To date, RV was part of the Fair Go Rates committee, and is now part of the Local Government Performance Reporting Framework (LGPRF) committee, developing better KPI metrics for Local Government. RV also facilitates ratepayer advocates to better leverage technology to support their reform contributing  activities. Therefore, it also plays  important roles in forming strategic partnerships; ICT enabled governance tools, including future Local Government big data analytics data capture and reporting capabilities

Aim and Outline
This project involves the development of a governance model that incorporates Gov 2.0 concepts and the KPIs of the Local Government.

URLs and References

  1. The LG Act
  2. The LGPRF Workbook
  3. Know Your Council website - https://knowyourcouncil.vic.gov.au/about
  4. About the LGPRF - http://www.dtpli.vic.gov.au/local-government/strengthening-councils/council-performance-reporting/about-the-performance-reporting-framework

Pre- and Co-requisite Knowledge
Prefer BIS majors otherwise none.


Change-point Detection in Hand Movement Data
Supervisors: Ingrid Zukerman, Jason Friedman (Tel Aviv University), Andisheh Partovi

Background
As part of an experiment on human perception and decision making (Physiology Department, Tel Aviv University), we have a set of hand movement data of subjects pointing in different directions, and possibly changing their direction in mid-action. We need to analyse these  data in order to determine  the time when the subjects have decided to change their pointing direction. This helps physiology researchers better understand the timeline of the decision making process in the brain. In order to identify the changes in the hand movement profile, the  student can utilise statistical  approaches such as Hidden Markov Models, which are often used in time series analysis and anomaly detection.

Aim and Outline
Developing a change-point detection algorithm to identify the changes in the trajectory of hand movements as soon they occur.

Pre- and Co-requisite Knowledge
FIT3080/FIT5047 Intelligent systems or equivalent is a mandatory prerequisite, and knowledge of time-series analysis is highly desirable.


Improving autonomous guidance within BB-8™ rolling droid by adding higher processing capabilities

Supervisors: Asad Khan, Richard Spindler (LateralBlast), David Hellewell (Intel Australia)

Background
The project will establish two-way communications between a low energy (LE) Bluetooth device, such as a Sphero’s BB-8™ rover droid and a computer for obstacle avoidance. The computer will host a parallel genetic algorithm (GA) and Fast Artificial Neural Net  (FANN) for calculating rapid solutions  to detected obstacles. A parallel implementation of the GA in C will be provided. This project will also provide on-loan the following items. (1) A BB-8™ droid, courtesy LateralBlast. (2) Intel Edison IoT computer, courtesy Intel.

Aim and Outline
LE Bluetooth connectivity will be established using Sphero’s Orbotix JavaScript SDK [1] for passing the droid’s sensory data to a higher speed processor. This processor will analyse the sensory data in real-time, using parallel GA and FANN, to compute  a suitable path for obstacle  avoidance. Limited range of the Bluetooth connection requires an intermediate computing device, which can be placed quite close to the droid for practical use. For this purpose, an Intel Edison IoT board [2], shall be made available for final testing  of the software.

URLs and References
[1] Orbotix JavaScript SDK https://www.npmjs.com/package/sphero
[2] Intel Edison IoT https://software.intel.com/en-us/iot/hardware/edison

Pre- and Co-requisite Knowledge
C/C++, and Java/Javascript. Knowledge of MPI (message passing interface) will be highly regarded.


A fast deeplearning framework for multiple scene analyses
Supervisors: Asad Khan, Y. Ahmet Sekercioglu (Heudiasyc France)

Background
The framework is expected to facilitate a number of applications requiring real-time image classification among multiple video streams. One such area is localisation of swarm robots.

Aim and Outline
This project will implement a parallel-distributed framework for rapid processing of multiple video streams for image classification. The code will be developed using the deeplearning module within OpenCV [1]. This code will be networked using message passing interface  (MPI) [2] or a similar  library. The code will thus be able to analyse an increasing number of video streams with relatively small increases in processing time.

URLs and References
[1] Loading Load Caffe framework models in OpenCV http://docs.opencv.org/trunk/d5/de7/tutorial_dnn_googlenet.html#gsc.tab=0
[2] Open MPI Library, https://www.open-mpi.org/

Pre- and Co-requisite Knowledge

C/C++ and Python.  Knowledge of MPI (message passing  interface) and Java/Javascript will be highly regarded.


Mobile App for injury surveillance in Cricket

Supervisors: Asad Khan and Naj Soomro (Faculty of Medicine, Nursing and Health Sciences & CricDoc Pvt. Ltd.)

Background
Cricket is the most popular summer sport in Australia. At junior levels of cricket, injury incidence ranges between 15-49% with the injury rates.1,2 Traditionally, injury surveillance has relied up the use of paper based forms or complex computer software. 3,4 This  makes injury reporting laborious for  the staff involved. A mobile application that can be used on the field, may be a solution to better injury surveillance in cricket. CricDoc Pvt Ltd made an android based mobile App (CricPredict) in 2015, as a prototype.

Aim and Outline
Re-design the existing CricPredict (injury surveillance App) so that it can run across platforms, and provides better UI. The App will be field tested with Mildura West Cricket Club. The resulting protocol for the App, along with validation of injury data will be  published as a protocol paper  in Sports Technology Journal.
The student may be offered a Monash Summer Research Scholarship of $1500 if their application is successful in the SRS summer round.

URLs and References

  1. Orchard J, James T, Kountouris A, Blanch P, Sims K, Orchard J. Injury report 2011: Cricket Australia. Sport Health. 2011;29(4):16.
  2. Das NS, Usman J, Choudhury D, Abu Osman NA (2014) Nature and Pattern of Cricket Injuries: The Asian Cricket Council Under-19, Elite Cup, 2013. PLoS ONE 9(6): e100028. doi:10.1371/journal.pone.0100028
  3. Ranson C, Hurley R, Rugless L, Mansingh A, Cole J (2011) International cricket injury surveillance: a report of five teams competing in the ICC Cricket World Cup 2011. Br J Sports Med 47(10): 637–43.
  4. Sports Medicine Australia, Cricket Injury Reporting form.
  5. Soomro, N., R. Sanders, and M. Soomro. "Cricket injury prediction and surveillance by mobile application technology on smartphones." Journal of Science and Medicine in Sport 19 (2015): e6.
  6. www.cricdoc.com

Pre- and Co-requisite Knowledge
A knowledge of Mobile programming, App Development & SQL servers.
Working knowledge of cross platform development applications like Meteor or PhoneGap will be useful.


BI in healthcare: comparative study
Supervisors: Rob Meredith and Frada Burstein

Background
The field of business intelligence is reaching a level of maturity with many good examples of successful implementation in some industry sectors, eg finance, professional services, manufacturing, telecommunications, etc. The project will consider the current practice  of a successful business intelligence  implementation in other industries and compare it with the special needs of the healthcare institutions. An exploratory case study at a large Australian hospital will be offered as a research setting.

Aim and Outline
The aim of the project is to come up with the generic principles of business intelligence success and apply it to health care case study.

URLs and References:
re- and Co-requisite Knowledge
Completed BI units (FIT5195 and/or FIT5097)


Quality of data framework for supporting healthcare information management
Supervisors: Frada Burstein and Rob Meredith

Background
Data quality issues come up very high on the agenda when dealing with organisational decision-making. Prior research demonstrated that there are sets of criteria which should be taken into consideration as a framework to evaluate the quality of data, and such framework  has to be tailored depending  on the context of the organization and the purpose of the evaluation.

Aim and Outline
The project will take a generic framework for data quality evaluation as a starting point to demonstrate its applicability to the large Australian healthcare institution. It will follow design science research to refine the framework through applying it to suit  the needs of information management  team.

URLs and References:

Pre- and Co-requisite Knowledge:
Knowledge of systems analysis and design and decision support principles, information management and knowledge management units completion will be useful


Pathfinding for Games
Supervisors: Daniel Harabor

Background
Pathfinding is fundamental operation in video game AI: virtual characters need to move from location A to location B in order to explore their environment, gather resources or otherwise coordinate themselves in the course of play. Though simple in principle such problems  are surprisingly challenging for game  developers: paths should be short and appear realistic but they must be computed very quickly, usually with limited CPU resources and using only small amounts of memory.

Aim and Outline
In this project you will develop new and efficient pathfinding techniques for game characters operating in a 2D grid environment. There are many possibilities for you to explore. For example, you might choose to investigate a class of "symmetry breaking'' pathfinding  techniques which speed up search  by eliminating equivalent (and thus redundant) alternative paths. Another possibility involves dynamic settings where the grid world changes (e.g. an open door becomes closed) and characters must re-plan their routes. A third possibility is multi-agent  pathfinding, such as cooperative settings  where groups of characters move at the same time or where one character tries to evade another.

Successful projects may lead to publication and/or entry to the annual Grid-based Path Planning Competition.

URLs and References
http://www.harabor.net/daniel/index.php/pathfinding/

Pre- and Co-requisite Knowledge
Students interested in this project should be enthusiastic about programming.
They should also have some understanding of AI Search and exposure to the C
and/or C++ programming language.


Analysing game hardness using optimisation
Supervisors: Pierre Le Bodic

Background
Many games can be modelled using Integer Programming (IP) [1], a mathematical abstraction. A game cast as an IP model can be solved with a specialised software called IP solver (e.g. IBM Cplex [2]).
Depending on the model, an IP solver can find a solution seemingly  instantly, or not terminate within  our lifetimes. This phenomenon has far-reaching implications, not only for games, but for many real-world applications, ranging from chip design to cryptography.

Aim and Outline
The aim of this project is to determine to what extent the hardness of an IP model can be predicted before or while solving it [3, 4, 5].
We will in particular test our discoveries on the sudoku game, for which many different grids are already classified from  easy to hard.
Other such games and problems can be considered.

URLs and References
[1] https://en.wikipedia.org/wiki/Integer_programming
[2] https://www.ibm.com/software/commerce/optimization/cplex-optimizer/
[3] http://dx.doi.org/10.1287/ijoc.1040.0107
[4] http://dx.doi.org/10.1287/ijoc.1100.0405
[5] http://dx.doi.org/10.2307/2005469

Pre- and Co-requisite Knowledge
A liking for (Applied) Maths is necessary.


Algorithm analysis techniques to improve Integer Programming solvers
Supervisors: Pierre Le Bodic

Background
Industry problems can often be modelled using Integer Programming (IP) [1], a mathematical abstraction. IP solvers (e.g. IBM Cplex [2]) provide an optimal solution to any problem described in that mathematical setting, but this process can take long. To be efficient,  state-of-the-art IP solvers combine multiple  solving algorithms. Each algorithm is usually well theoretically understood, but the combination used in solvers is not.

Aim and Outline
We will use algorithm analysis techniques (as in e.g. [3]) to theoretically investigate how algorithms are combined in state-of-the-art IP solving, and try to come up with better algorithms.

URLs and References
[1] https://en.wikipedia.org/wiki/Integer_programming
[2] https://www.ibm.com/software/commerce/optimization/cplex-optimizer/
[3] http://arxiv.org/abs/1511.01818

Pre- and Co-requisite Knowledge
A taste for algorithm analysis and computational complexity is desirable.


Metaphor Detection in Tweets Posted by Fibromyalgia Sufferers
Supervisors: Pari Delir Haghighi, Yong-Bin Kang

Background
Distressing and unpleasant emotional content and statements are reported to have adverse physiological effects on patients. Studies show that ‘statements with negative emotional content can increase patient anxiety and pain when compared with those patients receiving  neutral or positive comments’. Fibromyalgia  sufferers often use metaphors to describe their pain (e.g. ‘sharp as a knife’). Identifying and studying metaphors posted by the patients on Twitter could shed new light on the impact of the communication behaviour on  the pain experience. Metaphor  detection is a challenging task in the field of natural language processing, and becomes even more difficult when it is performed over short messages such tweet messages up to 140 characters posted in Twitter.

Aim and Outline
This project first aims to explore and employ novel approaches for metaphor detection in tweets by factoring in semantic relations of words. Second, it will use the results to provide better understanding of the communication behaviour among individuals with fibromyalgia.

Pre- and Co-requisite Knowledge
NLP and text mining knowledge, programming knowledge (Java preferred)


Game theoretical models of reputation dynamics
Supervisors: Julian Garcia

Background
This project uses game theory and computational models to study the dynamics of reputation and cooperation.

Aim and Outline
Reputation is important in enabling reliable interactions between strangers across many domains, including a host of applications online [1]. Simple mathematical models demonstrate that strangers can learn to cooperate with each other sustainably, if doing so will  enhance their reputation [2, 3].  These models use game theory to understand how agents learn to coordinate their independent actions [4]. While the resulting models of reputation dynamics are insightful, they are often based on games that are too simple and detached from reality  [5]. This project addresses that  gap by combining agent-based models and game theory.

URLs and References
[1] Resnick, Paul, et al. "Reputation systems." Communications of the ACM 43.12 (2000): 45-48.

[2] M. A. Nowak and K. Sigmund. Evolution of indirect reciprocity by image scoring. Nature, 393:573–577, 1998.

[3] M. A. Nowak and K. Sigmund. Evolution of indirect reciprocity. Nature, 437:1291–1298, 2005.

[4] M. A. Nowak. Five rules for the evolution of cooperation. Science, 314:1560–1563, 2006.

[5] FP Santos, FC Santos, et al. Social norms of cooperation in small-scale societies. PLoS Comput Biol, 12(1):e1004709, 2016. **

** Key reference.

Pre- and Co-requisite Knowledge
Problem solving skills, an interest in applied mathematics and simulation and skills in Python, C or both.


A comparative study of business communication effectiveness of entity-relationship and fact-based modelling
Supervisors: Rob Meredith and Peter O'Donnell

Project Overview
Entity-Relationship (ER) Modelling is the traditional approach to database and data warehouse modelling. Although Fact-Based modelling has been studied academically for many years it has had comparatively little penetration in the data management industry. Recent work has developed a structured natural language representation for Fact-Based Models. The textual representation provides a precise and concise model definition and its proponents claim that is it is more easily understood by business domain experts than traditional approaches. Working with our industry partner, Infinuendo, this project will involve comparing the effectiveness of conceptual and logical data model communication for a mid-sized business application between ER and textual FBM approaches. The student will need to (1) develop a working understanding  of both modelling approaches, (2) develop the evaluation criteria for business domain experts to understand, find mistakes and verify the conceptual and logical data models, and (3) run the experiment with sufficient numbers of participants so as to establish concrete results.

URLs and References
http://www.infinuendo.com

Pre- and Co-requisite Knowledge
Strong conceptual and logical data modelling capabilities. Excellent written English. Ability to take up new modelling techniques quickly.


Design of a fact-based modelling parser for generating data vault and dimensional models
Supervisors: Rob Meredith and Peter O'Donnell

Project Overview
An open area of research in Fact-Based Modeling is the design of a fact-based query language, and the development of a parser for the language and generator for data vault and dimensional forms. A number of research topics could be specified in this space depending  on the background of the  student and the parallel research/development work being carried out by Infinuendo.

2.1. Review of query languages, e.g. SQL, Sparkl, Prolog, Andl and CQL.
2.2. Design and documentation of schema and source to target mappings of data vault data warehouses. Work carried out under this project would involve working with the body of open source code published and maintained by Infinuendo. We would anticipate that relevant aspects of the student’s  work, if any, would  be published under the same open source licence.

URLs and References
http://www.infinuendo.com

Pre- and Co-requisite Knowledge
Strong conceptual and logical data modelling capabilities. Excellent written English. Ability to take up new modelling techniques quickly. Strong programming capabilities.


Deep Learning for Playing Games
Supervisors: Reza Haffari

Background
Deep Learning has revolutionised many subfields of artificial intelligence, including 'game playing'. The success of Google Deepmind's intelligent computer program in playing Go and Atari games has made a breakthrough in the field.

Aim and Outline
In this project, we look into the deep learning technology behind Deepmind's game players and try to understand and improvement it. Particularly, we look into employing better neural reinforcement learning algorithms for learning intelligent agents.

URLs and References
https://en.wikipedia.org/wiki/AlphaGo

Pre- and Co-requisite Knowledge
Data Structures and Algorithms (FIT2004), Intelligent Systems (FIT3080)


Deep Learning for Text Understanding
Supervisors: Reza Haffari

Background
Deep Learning has revolutionised many subfields of artificial intelligence, including automatic text understanding by machines. Recent successes based on this approach has high performing machine translation models, text summarisation models, etc.

Aim and Outline
In this project we aim to build neural models for better analysis of text. Potential applications include machine translation, text summarisation, textual reasoning, etc.

URLs and References
http://www.kdnuggets.com/2015/03/deep-learning-text-understanding-from-scratch.html
http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/#more-548

Pre- and Co-requisite Knowledge
Data Structures and Algorithms (FIT2004), Intelligent Systems (FIT3080)


Clustering for hierarchical time series forecasting with big time series data
Supervisors: Christoph Bergmeir

Background
Time series forecasting with large amounts of data gets more and more important in many fields. In this project, we will work with data from a large optical retail company that sells up to 70,000 different products in 44 different countries in over 6000 stores world wide. The goal is to produce accurate sales forecasts, which the company can use for store replenishment and -- more importantly -- supply chain management. The products are mainly produced in China, and have several week of lead time from production until they can be sold in a store.

Aim and Outline
The main challenge of this dataset is that many of the products are similar but have a short history as the assortment changes relatively quickly with fashion trends, so just using univariate time series forecasting may often not be possible due to this short history. In this project, we aim to apply different clustering techniques (kmeans, dbscan, MML-based clustering) on features extracted from the time series and features that are known independently (master data). In this way, we can determine the similarity between series and can then use the these similarities in subsequent forecasting steps, to achieve more accurate forecasts.

Pre- and Co-requisite Knowledge
R programming, Data Science, Machine Learning, Clustering techniques


Is Digital Health a Problem or a Solution: Health Informatics cases analysis and design
Supervisors: Prof Frada Burstein, A/Prof Henry Linger (FIT), Prof Marilyn Baird

Background
Digital Health agenda is the currently highly relevant for the delivery of an efficient health care in Australia and internationally. The success of digital health is essentially related to the technology infrastructure and information systems underlying its implementation. There are many known successful cases of digital health implantation, but may be even more examples where information systems and information technology resulted in a failure.

Aim and Outline
The aim of this project is to collect and analyse a range of successful and problematic cases of healthcare delivery where the information technology went right or wrong. The result of the analysis will be described and presented as a library of cases suitable for teaching health informatics to medical students.

Pre- and Co-requisite Knowledge
Interest in health informatics is a bonus.


Efficient algorithms for scheduling visits to student placements
Supervisors: Sue Bedingfield, Kerri Morgan and Dhananjay Thiruvady


Background, Aim and Outline

This project aims to develop an efficient algorithm for scheduling visits to students on placement. Each student must be visited by one of a group of visitors subject to a number of constraints including availability of students and their supervisors and visitors, time required to travel between locations, workload of each visitor, and a preference to have a different visitor visit a given student on each visit. Ideally, we want to minimise the distances travelled between locations, ensure that the number of visits at a single location occur sequentially and are allocated to the same visitor, and minimise the amount of time required by a visitor to complete their workload of visits.

Scheduling is itself hard, and this problem is potentially harder due to the many requirements. In this project, the student will explore the use of heuristics and mixed integer programming can be applied to obtain an efficient algorithm for this problem. The demand for an effective solution to this problem has wide-ranging applications particularly with increasing numbers of student placement programs.

Pre- and Co-requisite Knowledge
Required: Strong programming skills. Able to write scripts for tasks such as re-formatting data and collating results.
Preferred: An interest in learning about optimisation techniques applied to real world problems.


Car sequencing
Supervisors: Kerri Morgan and Dhananjay Thiruvady

Background, Aim and Outline

Car manufacturers typically encounter the problem of determining a sequence of cars to be scheduled on an assembly line. The cars require several options (e.g. air conditioners, sun roofs, etc.) and cars requiring the same options need to be spaced out far enough apart such that the stations installing the options can effectively deal with the demand.

We have developed mixed integer programming-based heuristic algorithms to solve car sequencing consisting of 500 cars. The aim of this project will be to extend current algorithms or develop new algorithms to be able to deal with many more cars (e.g., 2000 cars). This will also require creating/extending a problem generator which can be used to create large and interesting problem instances.

Pre- and Co-requisite Knowledge
* Required: Programming skills in C++ or python
* Preferred: An interest in learning about optimisation techniques applied to real world problems.


Deep learning to identify pool pump power consumption

Supervisors: Lachlan Andrew and Zahraa Abdallah

Background
With an increase in use of intermittent renewable energy, like wind in South Australia, there is increasing need for users to be adjust their power consumption to match availability ("demand response"). Pool filtration pumps consume large amounts of power, and people usually do not mind too much when they are operated, and so ceding control of pool pumps to the electricity company -- in exchange for a reduced bill -- is a promising form of demand response. To evaluate the potential of this, it is necessary to find out when people currently run their pool pumps.

Aim and Outline
The aim of this project is to implement a convolutional neural networks to identify the times of use of pool pumps that are on timers. This problem is equivalent to looking for rectangles in a very noisy image, with non-stationary, highly correlated and highly anisotropic noise (i.e., the noise is very different in different parts of the image, the noise at one pixel is very similarly to noise at nearby pixels, and the correlation is different in different directions).
The convolutions neural networks will be trained on several images where the rectangles have been identified manually. If time permits, the performance of the neural network will be compared with that of an existing heuristic algorithm.

Pre- and Co-requisite Knowledge
Basic maths is needed
Familiarity with the basics of neural networks is an advantage, but
not necessary.
Familiarity with either Matlab or R is an advantage.


Robustness of Australian gas and electricity networks
Supervisors: Rajab Khalilpour, Ariel Liebman, David Green

Background
Australian national electricity grid's robustness has been questioned in several occasions such as Basslink failure in 2015 and South Australia blackout in 2016.

Aim and Outline
The aim is to study Australian electricity and gas networks topology for assessing the networks' resilience to disturbance as well as their cybersecurity.

Pre- and Co-requisite Knowledge
Network theory


Data analysis and visualisation for electrification of remote underdeveloped locations
Supervisors: Rajab Khalilpour, Ariel Liebman, Lachlan Andrew, Tim Dwyer


Background
A developing country has over 80,000 villages with about 10,000 being unelectrified. The government has allocated an insufficient budget for electrification of these villages? How would you prioritize the villages and select the right ones to be electrified first?

Aim and Outline
The aim of this project is to utilize machine learning clustering technics to assess a database with 80,000 rows and develop decision support tools for helping decision makers in finding the most optimal (least cost and fair) set of villages for electrification.

Pre- and Co-requisite Knowledge
Data analysis, machine learning, decision analysis


Forecast of electricity load and weather conditions
Supervisors: Rajab Khalilpour, Ariel Liebman, Lachlan Andrew, Souahib Ben Taieb

Background
With distributed generation and storage reaching our houses, the need for demand forecast becomes essential in order to manage our local energy system.

Aim and Outline
You have a house with PV and battery system. You desire to manage your system in a way to minimize your energy bill. The key requirement for such an aim is to predict the weather condition and your load over the next day. With these data, you could develop a scheduling method for your PV-battery system. The goal of this project is to develop efficient forecasting algorithms for short term projection of electricity demand and weather conditions (e.g. temperature, humidity, wind speed, and solar radiation). Students with prior knowledge (or motivation to master) in statistical analysis are suggested for this project.

Pre- and Co-requisite Knowledge
Data analysis


Modelling of energy systems with PV, stationary battery, and electric vehicle
Supervisors: Rajab Khalilpour, Pierre Le Bodic, Ariel Liebman, John Betts

Background
We have just passed a tipping point in PV uptake in Australia and some other places around the world. The next energy revolution is expected to be battery energy storage. Germany has just passed a law that from 2030, all new cars are mandated be electric.

Aim and Outline
Imagine your family in 2030, when your house has a rooftop PV, and a stationary battery to store surplus PV generation. You also have an electric car that you could either charge at home, work, or e-stations (petrol stations of tomorrow). At night, you might connect your stationary or car battery to supply your house's energy demand! The objective of this project is to develop an optimization scheduling program for energy management (minimum bill) of your future house.

URLs and References
http://www.springer.com/gp/book/9789812876515

Pre- and Co-requisite Knowledge
Mixed-integer programming and Simulink (can learn during the thesis)

Multi-attribute decision making approaches for evaluation of energy storage technologies
Supervisors: Rajab Khalilpour, Aldeida Aleti, Pierre Le Bodic, Ariel Liebman

Background
We have just passed a tipping point in PV uptake in Australia and some other places around the world. The next energy revolution is expected to be energy storage. There are several energy storage technologies with various features. This makes the technology selection process complex.

Aim and Outline

You are thinking of buying an energy storage system to store your surplus PV generation to use later (rather than selling to the grid at a low price). There are several energy storage products in the market, with various features (lifetime, cost, charge time, depth of discharge, energy throughput, weight, volume, etc.) which makes the decision-making complex. The objective of this research is to utilize Multi-Attribute Decision Making approaches for evaluation of energy storage systems.

Pre- and Co-requisite Knowledge:
Background or interest in learning (behavioural) decision analysis



Pre-Visit Wayfinding for the Vision Impaired using a Prototype Interactive Controller and a Virtual 3D Environment Deployed on a Mobile Phone
Supervisors: Michael Morgan

Background:
Wayfinding and navigation in unfamiliar places is a challenging task for those with a vision impairment, as it is difficult to convey spatial information to them before they visit a site. While solutions in the form of tactile diagrams are available they are costly to produce, do not convey some spatial information well, are limited in the contextual information that they can provide and have issues in terms of relating the scale of the diagram to the scale of the actual environment. What is needed is a more interactive and 'embodied' way to explore a location before they visit it in order to create a mental model for navigating in the space and finding their way to significant target locations.

Interactive objects can be developed as controllers to link the physical 'embodied' world to 3D simulations of environments in order eliminate the need for vision-based interfaces. Using rapid prototyping with a combination 3D printing and low-cost computing devices (such as a Arduino board and a Adafruit Absolute Orientation Sensor), it is possible to create a controller object with a button-based interface. This device can then be connect wirelessly to a 3D simulation of a physical environment developed in Unity and deployed on a mobile phone. Feedback to the user can be achieved through audio and haptics cues in order to avoid the need for a vision-based interface.

Aim and Outline
The aim of the project is to create a prototype an interactive controller object and to connect this to a 3D simulation of a physical location deployed on a mobile phone. The project will explore:
* The features required for the controller object,
* The features of the 3D environment that need to be modelled (in this case the proof of concept study will be of the sensiLab area),
* Creating a simulation that runs on a mobile phone platform and that will receive motion data from the controller object,
* The interface requirements for the controller object and the 3D simulation needed to cater for the vision impaired users with respect to audio and tactile feedback,
* User testing of the proposed system.
This will enable vision impaired people to explore the layout and important features of the location before visiting it in person. Ideally, a person will be able to download the 3D simulation of any physical location they intend to visit to their mobile phone and to explore or refresh their understanding of the layout of the space. Possible applications include modelling public locations, such as transport hubs, government buildings that provide services and work environments.

This project is based in the sensiLab research workshop at the Caulfield campus. If you are accepted for this project you will need to work regularly within the lab to access the equipment and facilities needed to develop the project.

URLs and References
Adafruit BNO055 Absolute Orientation Sensor, https://learn.adafruit.com/adafruit-bno055-absolute-orientation-sensor/overview
Talking d20 20-Sided Gaming Die, https://learn.adafruit.com/talking-d20-20-sided-gaming-die/overview
Roll-a-ball tutorial, https://unity3d.com/learn/tutorials/projects/roll-ball-tutorial

Pre- and Co-requisite Knowledge
Explicit knowledge of any specific technologies is not required, however the student must be prepared to investigate and use any new technologies that may be suitable for the project. Technologies will most likely include:
* 3D Printing, including basic modelling'
* Low cost computing and components, such as Arduino boards and gyroscope sensors,
* Unity interactive environment development and deployment on a mobile platform.


Interactive and exploratory visualisation of time series data
Supervisors: Zahraa Abdallah, Minyi Li

Background
A time series is a sequence of observation taken sequentially in time. Many sets of data appear as a time series in almost every application domain, e.g., daily fluctuations of stock market, traces of dynamic processes and scientific experiments, medical and biological experimental observations, various readings obtained from sensor networks, position updates of moving objects in location-based services, etc. As a consequence, in the last decade there has been a dramatically increasing amount of interest in techniques for time series mining and forecasting. The very first step to understand time series data is to visualize it. Exploratory time series visualization enables essential tasks in analysis. With interactive and explanatory visualization, we will be able to answer questions such as:
- How similar different time sets are
- Are there any spikes in the data?
- Is there any pattern that we can extract?
- Can we notice possible shifts between similar sets time series?

Aim and Outline
Our aim in this project is to build a web-based interactive tool to visualise time series data. The tool will be able to build the basic exploratory data analysis and statistics of time series at different time granularity using various features. We will also investigate methods to find similarities and differences between time series using a set of metrics such as Euclidean distance. Many publicly available time series datasets can be used in this project such as traffic data, weather data, stock market ..etc.

URLs and References
Time series data:https://datamarket.com/data/list/?q=provider:tsdl

Pre- and Co-requisite Knowledge
The student must have experience in programming.


Understanding your preferences/choices/decisions
Supervisors: Minyi LI, Zahraa Abdallah

Background
No matter whether you realize or not, preferences are everywhere in our daily lives. They occur as soon as we are faced with a choice problem.
? It could be as simple as a pairwise comparison involved only a single decision variable, e.g., “I preferred to have red wine rather than white wine for dinner tonight”;
? or it could involve multiple decision criteria, “which mobile and internet bundle deal would you prefer?”
? most oftenly, preferences are conditional, i.e., the attributes for making a decision/choice could have dependencies on each other. As an extended example from the choice of wine, your preference over wine could depends on your choice of main meal, e.g., you may prefer white wine to red wine if you are going to have fish, or reversely if you are having beef as the main course.
Understanding and predicting users' preferences play a key role in various fields of applications, e.g., recommender systems, adaptive user interface design, general product design and brand building, etc.. However, in real-world decision problems, users' preferences are usually very complex, i.e., they generally have multiple decision criteria and have to deal with an exponential number of choices. This makes the investigation directly through preference relations/ranking over the entire choice space become ineffective and infeasible. Therefore, more efficient ways of understanding a user's preference through its structures and the interactions between decision variables are essential.

Aim and Outline
In this project, we will investigate methods to construct the structure of user preferences and understand the interactions between decision variables from data – it involves learning from observations that reveal information about the preferences structure of an individual or a class of individuals, and building models that generalize beyond such training data. Research might involve learning preference structure from real world data sets including netflix movie rating data, car preference data, etc..

URLs and References
https://en.wikipedia.org/wiki/Preference_learning

Pre- and Co-requisite Knowledge
The student must have experience in programming.


Sentiment Analysis in Education and e-Government
Supervisors: Dr. Chris Messom, Dr. Yen Cheung

Background
Evidence of the influence of people's opinion on the types of products and services that will be offered are emerging from the fast-growing research in affective computing and sentiment analysis. In particular, mining sentiments over the Web for commercial, higher education and government intelligence applications are gaining research momentum. Current approaches to affective computing and sentiment analysis fall into 3 main categories: knowledge based techniques, statistical methods and hybrid methods. Whilst the knowledge based approach is popular with unambiguous text, it does not handle the semantics of natural language or human behaviour very well. Similarly, statistical methods are also semantically weak and usually require a large text input to affectively classify text. The hybrid approach combines both techniques to infer meaning from text.

Aim and Outline
This project aims to develop a sentiment harvest model/system to evaluate Educational Systems and e-Government Systems using the hybrid approach.

URLs and References
Adinolfi, Paola, Ernesto D'Avanzo, Miltiadis D. Lytras, Isabel Novo-Corti and Jose Picatoste. "Sentiment Analysis to Evaluate Teaching Performance." IJKSR 7.4 (2016): 86-107. Web. 8 Feb. 2017. doi:10.4018/IJKSR.2016100108

E. Cambria, "Affective Computing and Sentiment Analysis," in IEEE Intelligent Systems, vol. 31, no. 2, pp. 102-107, Mar.-Apr. 2016.

Cambria, E., Grassi, M., Hussain, A., & Havasi, C. (2012). Sentic computing for social media marketing. Multimedia Tools and Applications, 59(2), 557-577. doi:http://dx.doi.org.ezproxy.lib.monash.edu.au/10.1007/s11042-011-0815-0

M Araujo et al, “iFeel:A system that compares and combines sentiment analysis methods”, Proc. 23rd Int'l Conf. World wide Web, 2014, pp 75-78.

Pre- and Co-requisite Knowledge
Some basic knowledge of AI is preferred (such as completion of an AI unit at undergraduate level). Otherwise, a highly enthusiastic and keen novice AI researcher may also be suitable for this project.


Cloud ERP: An organisational motivation and learning perspective
Supervisors: Dr Mahbubur Rahim, Dr Sue Foster and Dr Taiwo Oseni (External)

Background
The scepticism and uncertainty that finance executives initially felt about moving their mission-critical enterprise systems to the cloud is gradually fading and is now being replaced by a growing enthusiasm for the financial flexibility and freedom that comes from using the cloud's modular, pay-as-you-go approach to accessing the latest technology innovations (Miranda, 2013). In the last 4 to 5 years, several surveys detail why CFOs are increasingly open to moving their, enterprise applications into the cloud. For example, a 2012 survey by Gartner, a Financial Executives Research Foundation (ERF) and technology advisory firm, found that 53 percent of CFOs expect up to a 12% rise in the number of enterprise transactions delivered through software-as-a-service over the next four years (Gartner, 2012).
Now, 5 years later, it would be worthwhile to investigate the success of the move of enterprise transactions to the cloud. Has this 12% or even more being achieved as expected? Or has a hybrid solution, where the most critical and resource-demanding modules are kept on premise while less critical ones are deployed on a public cloud, been a more appropriate solution?

Aim and Outline
Recent studies such as (Mezghani, 2014) now report organisational intentions to switch from on-premise to cloud ERP, identifying the antecedents and determinants of the decision. This study will explore, within a large organisation setting, organisational motivation and organisational learning associated with moving ERP business processes to the cloud. Organisational motivation and organisational learning attributes can be helpful for understanding why and how organisations migrate towards cloud-based ERP. The study will aim to investigate:
* How does organisational motivation influence how ERP-based business processes are migrated to cloud ERP model?

URLs and References
Miranda, S. (2013). Erp in the cloud: Cfos see the value of running enterprise applications as a service. Financial Executive, 29(1), 65-67.
Mezghani, K. (2014). Switching toward cloud erp: A research model to explain intentions. International Journal of Enterprise Information Systems, 10, 46+.
Gartner. (2012). Ceo survey 2012: The year of living hesitantly. Retrieved from https://www.gartner.com/doc/1957515/ceo-survey--year-living

Pre-requisite
Either completion or nearing completion of bachelor degree in IS/IT


Cloud ERP in SMEs: The role of consultants
Supervisors: Dr Mahbubur Rahim, Dr Sue Foster and Dr Taiwo Oseni (External)

Background
While cloud computing is receiving increased research focus in recent times, cloud ERP is arguably one of the most valuable and influential SaaS applications available in the market, and as such should be the next wave of ERP and cloud research (Chao, Peng, & Gala, 2014).
Given the importance of ERP systems and the inherent complexity of supported business processes, functionality of the software, amount of data processed, and the financial flexibility and freedom that comes from using the cloud's modular, pay-as-you-go approach to accessing the latest technology innovations SMEs are well-suited to exploit the best practices cloud ERPs support (Johansson, Alajbegovic, Alexopoulos, & Desalermos, 2014; Miranda, 2013).
Best practices of cloud ERPs and immediate access to infrastructure and software, factors that result in the fast deployment of cloud-based ERP permit SMEs, which typically have less and simpler activities, to deploy and utilize a constantly maintained and updated cloud ERP solution. The vendor, who maintains the system, can also guarantee its optimal use, ensuring their business continuity. However, many cloud providers offer such high security levels for their products that SMEs cannot manage the implementation themselves (Johansson et al., 2014). As such, many SMEs require an ERP consultant's help in migrating to cloud ERP.

Aim and Outline
The aim of the study is to document the experience of an SME and capture relevant lessons learned. Such lessons can be useful for enhancing cloud ERP research in addition to guiding other organisations who are considering a similar move. The study will investigate:
* What role do ERP consultants play in assisting SMEs in deploying cloud ERP?

URLs and References

Chao, G., Peng, A., & Gala, C. (2014). Cloud erp: A new dilemma to modern organisations? Journal of Computer Information Systems, 54(4), 22-30.
Johansson, B., Alajbegovic, A., Alexopoulos, V., & Desalermos, A. (2014). Cloud erp adoption opportunities and concerns: A comparison between smes and large companies. Paper presented at the Pre-ECIS 2014 Workshop" IT Operations Management"(ITOM2014).
Miranda, S. (2013). Erp in the cloud: Cfos see the value of running enterprise applications as a service. Financial Executive, 29(1), 65-67.

Pre-requisite

Either completion or nearing completion of bachelor degree in IS/IT


Semi-supervised learning for activity recognition from mobile sensors
Supervisors: Zahraa S. Abdallah

Background
The availability of real time sensory information through these sensors has led to the emergence of research into “Activity Recognition” (AR). Activity recognition aims to provide accurate and opportune information based on people's activities and behaviours. Many applications have demonstrated the usefulness of activity recognition. These include health and wellness, activity-based crowdsourcing and surveillance, and targeted advertising Researchers have addressed many challenges in activity recognition, while many other challenges are yet to be resolved. State of the art activity recognition research has focused on traditional supervised learning techniques. However, there is typically a small set of labelled training data available in addition to a substantial amount of unlabelled training data. The process of annotating such data or finding ground truth is tedious, time-consuming, erroneous, and may even be impossible in some cases. Thus, semi-supervised, active and incremental learning are increasingly being investigated for activity recognition to overcome the limitation of scarcity of annotated data.

Aims and Outline
We aim first to survey semi-supervised and unsupervised learning approaches for activity recognition. Then, we will need to develop new technique that incorporates active and incremental learning for more accurate activity recognition with limited availability of labelled data.

URLs and References
https://en.wikipedia.org/wiki/Semi-supervised_learning
https://en.wikipedia.org/wiki/Activity_recognition
https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

Pre- and Co-requisite Knowledge
The student must have experience in programming.


Detecting insider threats in streaming data
Supervisors: Zahraa Abdallah, Geoff Webb

Background
Insider threat detection is an emerging concern in the globe as a result of the increasing number of insider attacks in recent years. In the Cyber Security Watch Survey, the statistics revealed that 21% of attacks are insider attacks. For instance, the insider attack of Edward Snowden was reported as the biggest intelligence leakage in the US. This attacks maps to IP theft case scenario, where Snowden disclosed 1.7 million classified documents from the National Security Agency to mass media. We address the insider threat problem as a stream mining problem of data streams generated from security logs, network data, and email headers. The challenge here is to distinguish between a normal change in insider's behaviour, and the evolution of a new concept that may be an indication of a malicious insider threat.

Aims and OutlineIn this project, we aim to apply different stream mining techniques to detect insider threats using Internet logs. The main challenge is to discover the evolution of new concepts and distinguish between normal behaviour and threats in data streams. Streaming data typically arrive in high speed and require real time analysis. Thus, the efficiency of the applied techniques is a crucial factor to consider.

URLs and References
http://www.cert.org/insider-threat/research/cybersecurity-watch-survey.cfm?
https://en.wikipedia.org/wiki/Data_stream_mining

Pre- and Co-requisite Knowledge
The student must have experience in Java programming.


Inferring Concurrent Specifications for a Sequential Program
Supervisors: Chris Ling, Yuan-Fang Li

Background
It is non-trivial to automatically exploit potential parallelism present in the source code written in mainstream programming languages such as Java and C#. One of the main reasons is the implicit dependencies between the shared mutable states of data. In these languages, compilers follow the execution order (sequentially) in which the program is written in in order to avoid side effects. Therefore, programmers need to write parallel programs in order to exploit computing power offered by the now prevalent multi-core architecture.

It is generally acknowledged that writing parallel programs using multithreading, is a difficult and time-consuming task due to errors such as race-conditions and deadlocks. Therefore, there is a substantial need of methods and tools to for automated exploitation of parallelism.

Aims and Outline
In order to help programmers to reason about concurrency, researchers have developed a number of abstractions called 'Access Permissions'. Access Permissions characterise the way multiple threads can potentially access a shared state. Our goal is to develop techniques that can automatically infer implicit dependencies (read/write behaviours) from a sequential Java program. Such dependency information can eventually be used to automatically parallelise the execution of these programs instead of requiring programmers to write concurrent programs using multi-threading.

We have already developed a high-level algorithm to infer dependencies. In this project, we aim to refine the algorithm and develop an Eclipse plugin that implements our proposed technique.

URLs and References

[1] Kevin Bierhoff, Nels E. Beckman, and Jonathan Aldrich. Practical API protocol checking with access permissions. In ECOOP, pages 195- 219, 2009.
[2] John Boyland. Checking interference with fractional permissions. In Static Analysis, volume 2694 of Lecture Notes in Computer Science, pages 55-72. Springer Berlin Heidelberg, 2003.
[3] Stork, S., Naden, K., Sunshine, J., Mohr, M., Fonseca, A., Marques, P., & Aldrich, J. (2014). AEminium: A permission-based concurrent-by-default programming language approach. ACM Transactions on Programming Languages and Systems (TOPLAS), 36(1), 2.

Pre- and Co-requisite Knowledge
* Good programming skills in Java or a similar object-oriented language
* Knowledge of basic object oriented constructs
* Knowledge of graph as a data structure


Optimising of multi-telescope observations of gravitational wave events
Supervisors: Assoc. Prof. David Dowe, Dr. Evert Rol, Dr. Duncan Galloway (School of Physics & Astronomy, Faculty of Science)

Background
With the first detections of gravitational waves established, the challenge now for astrophysicists is to find their electromagnetic counterparts. Detection of these elusive counterparts will confirm and constrain the models for the progenitors of gravitational waves. Details such as distance confirmation and the (electromagnetic) energy emitted, and the environment of the gravitational wave (GW) event, provide the necessary information to model the scenario that lead to such a catastrophic event.

Finding these counterparts is challenging, as the current localisation by GW detectors often yields a search area of hundreds of square degrees, often in disparate areas of the sky. Coordinated follow-up observations are essential, especially for small field-of-view telescopes. In particular at optical wavelengths, where a multitude of small field-of-view telescopes exist, uncoordinated observations may result in many duplicated efforts, while missing out large portions of the larger localisation area.

Aims and Outline
For this project, we have developed an approach involving genetic algorithms to optimise the search for counterparts in such cases. This way, we can easily incorporate constraints such as the area visibility per telescope, differences in field-of-views, or the expected brightness evolution of the counterpart.

A potential disadvantage of using a genetic algorithm is that such an algorithm is generally slower than other optimisation algorithms. Searches for counterparts, however, are generally required to start as soon as possible after the GW event, and even a 15 minute delay may lose valuable information.

The goal of the project is then to find the best set of algorithm parameters for a wide set of scenarios, so that we can create a fast and flexible scheduling tool.

In short:
- improve the algorithm, in particular its speed. This can be done both by improving the actual code, and tuning the algorithm parameters
- make the algorithm (fitness function) more flexible, to easily incorporate a wide variety of (observing) constraints
- compare a variety of observing scenarios, to determine where the largest improvements can be made in scheduling

URLs and References
- Example (simulated) GW location maps: http://www.ligo.org/scientists/first2years/#2016
- Current follow-up observations of the first GW event: https://dcc.ligo.org/public/0122/P1500227/012/GW150914_localization_and_followup.pdf (in particular Figure 3)
- Earlier work done on telescope scheduling with genetic algorithms (but only for point sources): http://rts2.org/scheduling.pdf

Pre- and Co-requisite Knowledge
- (Heuristic) optimisation
- programming languages: Python, C
- Affinity with astronomy (there are no particular astrophysical requirements)


Storytelling with Outdoor Augmented Reality
Supervisor: Bernie Jenny

Background
Telling stories with Augmented Reality (AR) is still in an early, exploratory phase. Story telling can be used to guide a user through a physical environment and develop a narrative. It is expected that one of the ultimate uses of AR technologies will be as a new form of location-based media that enables new storytelling experiences.

Aims and Outline
The goal of this project is to develop a prototype app for mobile phones or tablets for a part of Melbourne. The app will guide users along a route using state-of-the-art visualisation technology to develop a captivating narrative. The app will combine computer vision algorithms to track physical features, such as building facades, monuments, and other stationary objects in selected locations. The tracked locations will be the stages for the story and will be enhanced with 3D and multimedia elements to convey an immersive experience.

Pre- and Co-requisite Knowledge
Java, C++, or C#


Immersive geovisualisation
Supervisor: Bernie Jenny

Background
Little is known how quantitative geodata in a 3D visualisation are most effectively displayed. Better visualisation methods are required for diverse data, such as air pollution, noise levels, bushfire propagation, or toxic gas emissions at waste disposal sites.

Aims and Outline
The goal of this project is to develop new immersive 3D visualisation methods that are accurate, effective, and unambiguous to read. The visualisations should be applicable to AR (Augmented Reality), VR (Virtual Reality) and interactive 3D maps. You will draw inspiration from light painting art projects for placing bars and profiles in a 3D scene to visualise quantitative geodata. User acceptance, effectiveness, and efficiency can be evaluated through expert feedback and/or a user study.

Pre- and Co-requisite Knowledge
Computer graphics/OpenGL


Relief Shading for Google Maps
Supervisor: Bernie Jenny

Background
Google are sponsoring this research project to develop a new method for shading terrain for their maps. Relief shading is an effective and widely used method to show hills and valleys with subtle brightness gradients in maps. Continuous-tone raster images are traditionally used to store and render shaded relief. However, the latest version of Google Maps uses client-side rendering of vector data, which results in less informative and less pleasing shaded relief.

Aims and Outline
The goal of this project is to develop a method for rendering continuous-tone shaded relief images for web maps from terrain skeletal lines. A first prototype was developed [1] using diffusion shading, which is an efficient way to create smooth colour gradients from vector data. You may either (1) use the new WebGL 2 standard for browsers to render shaded relief with the diffusion shading method for web maps; or (2) improve the prototype to diffuse illumination directions instead of greyscale brightness values.

URLs and References
[1] Marston, B. E. and Jenny, B. (2015). Improving the representation of major landforms in analytical relief shading.
https://www.researchgate.net/publication/277576153_Improving_the_representation_of_major_landforms_in_analytical_relief_shading
Also see the related project “Fast extraction of ridgelines from terrain models”

Pre- and Co-requisite Knowledge
OpenGL for option (1), Java for option (2)


Fast extraction of ridgelines from terrain models
Supervisor: Bernie Jenny

Background
Topographic structures, such as ridgelines or valley lines, can be extracted from digital elevation models. “Maximum branch length” is an algorithm for extracting high-quality ridgelines [1] that are used for creating shaded relief images [2] or for hydrographic analyses. An open-source implementation is available in Whitebox [3]. Unfortunately the maximum branch length algorithm is very slow; it can take several hours to compute the ridgelines for a relatively small terrain model with, for example, 5000 × 5000 values.

Aims and Outline
The maximum branch length algorithm traces flow paths on the terrain (that is, lines of steepest flow that a drop of water would follow). To accelerate the extraction of flow paths, you will replace the pixel-based tracing algorithm with a directed graph. This graph tree is constructed from the terrain. Instead of a pixel-by-pixel tracing of the water flow on the raster terrain model, you will use Dijkstra's shortest path algorithm to extract flow paths and ridgelines.

URLs and References
[1] Lindsay, J.B. and Seibert, J., 2013. Measuring the significance of a divide to local drainage patterns. International Journal of Geographical Information Science, 27 (7), 1453–1468.
[2] Marston, B. E. and Jenny, B. (2015). Improving the representation of major landforms in analytical relief shading.
https://www.researchgate.net/publication/277576153_Improving_the_representation_of_major_landforms_in_analytical_relief_shading
[3] Whitebox: http://www.uoguelph.ca/~hydrogeo/Whitebox/
Also see the related project “Relief Shading for Google Maps”

Pre- and Co-requisite Knowledge
Ideally Java programming


Interactive video maps
Supervisor: Bernie Jenny

Background
Video embedded in geographic web maps is a recent development [1]. An example of an interactive video map is at http://co2.digitalcartography.org. The central point of this map can be moved and the content of the map can be adjusted as the movie plays.

Aims and Outline
The goal of this project is to develop additional interactive features that allow the map user to better understand the data in the video. You will develop user-adjustable tools for high-resolution video streams for (1) classification, colouring, and filtering of values, (2) interactive probing of values, linked to dynamic diagrams, or (3) visualising a geospatial video as an animated three-dimensional surface. A robust method for coding numerical data values in a coloured video stream needs to be developed. WebGL and HTML5 video can be used to stream, decode and colour/filter/analyse the data.

URLs and References
[1] Jenny, B., Liem, J., Šavric, B. and Putman, W. M. (2016). Interactive video maps: A Year in the Life of Earth’s CO2. Journal of Maps, 12(sup1), 36–42. www.researchgate.net/publication/297684678_Interactive_video_maps_A_year_in_the_life_of_Earth%27s_CO
Interactive example video map: http://co2.digitalcartography.org

Pre- and Co-requisite Knowledge
Web programming, ideally OpenGL or WebGL


Adaptive composite projections for geographic maps
Supervisor: Bernie Jenny

Background
Adaptive composite map projections combine several projections and adapt the map's geometry to map scale, the map's height-to-width ratio, and the central latitude of the displayed area [1, 2]. Multiple projections are combined and their parameters adjusted to create seamless transitions as the user zooms or pans the map. Unlike the Mercator projection (which is currently used by all major web maps) composite projections can show the entire globe including poles, and they show geography without area distortion.

Aims and Outline
This project has two parts. (1) Develop version 2 of adaptive composite map projections using a new, recently discovered method for transitioning between projections. (2) Develop a plug-in with adaptive composite map projections for D3, the leading visualisation library for the web [3]. You will collaborate with developers at Esri Inc., the biggest manufacturer of geographic information systems, who are currently adding adaptive composite map projections to their ArcGIS software.

URLs and References
[1] Jenny, B. (2012). Adaptive composite map projections. IEEE Transactions on Visualization and Computer Graphics, 18-12, p. 2575–2582.
[2] http://cartography.oregonstate.edu/ScaleAdaptiveWebMapProjections.html
[3] https://d3js.org

Pre- and Co-requisite Knowledge
Web programming


Three-dimensional geographic flow maps
Supervisor: Bernie Jenny

Background
Flow maps show the direction and volume of moving goods, ideas, money, etc. between places [1, 2]. Flow maps are rarely rendered as three-dimensional objects, or used in immersive 3D visualisation.

Aims and Outline
The goal of this project is to develop methods for the visualisation of geographic flows in three dimensions. Various options can be explored: (1) develop an algorithm for 2D maps that arranges flows on the z-axis and renders them with a ray-tracer; (2) conduct a user study to compare 2D and 3D flow maps; (3) develop a program with Unity to visualise 3D flows with immersive visualisation (head mounted displays or the Monash CAVE2)

URLs and References
[1] Jenny, B. et al. (2017). Design principles for origin-destination flow maps. Cartography and Geographic Information Science.
[2] Jenny, B. et al. (2017). Force-directed layout of origin-destination flow maps. International Journal of Geographical Information Science.
http://monash.edu.au/cave2

Pre- and Co-requisite Knowledge
C#, C++, or Java for option (3).


Georectification of old maps
Supervisor: Bernie Jenny

Background
Many map libraries are currently scanning their old maps to better protect these often fragile and precious items. Historians use the scanned maps in geographic information systems and combine them with other geospatial data. To make this possible, old maps are georectified, that is, the old map is deformed to align with the coordinate system of a modern map.

Aims and Outline
Various methods exist for georectifying raster images, but they all deform text labels and map symbols. This project aims at developing a georectification method that preserves text labels, point symbols and lines. You adapt the moving least-squares method, which is used in computer graphics for reconstructing a surface from a set of points. MapAnalyst, a tool for the analysis of old maps, can be extended with this new method.

URLs and References
http://mapanalyst.org
Jenny, B. and Hurni, L. (2011). Studying cartographic heritage: analysis and visualization of geometric distortions. Computers & Graphics, 35-2, p. 402–411.

Pre- and Co-requisite Knowledge
Java programming


Graduated dot maps
Supervisor: Bernie Jenny

Background
Dot maps show quantities on maps, for example, each dot can represent 1000 people. Extracting quantities from dot maps is difficult, because dots often overlap in dense areas. Graduated dot maps are a recent improvement that use dots of variable size, for example, a small dot for 200 people, a medium dot for 1000 people, and a large dot for 10,000 people. It has been shown that with graduated dot maps quantities are easier to estimate. A recently suggested method (in review) uses (a) the capacity-constrained Voronoi tessellation (CCVT) algorithm to place dots in a “nice” blue-noise pattern without overlaps, and (b) the DBSCAN clustering algorithm to identify dense clusters of dots. The dots identified by the DBSCAN algorithm are replaced with larger dots.

Aims and Outline
This project has four components: (1) The method outlined above can result in dots being placed in areas that make no sense (e.g. dots for people placed on oceans). The CCVT algorithm needs to be modified to prevent this. (2) The DBSCAN clustering algorithm can be simplified. (3) A plug-in for a geographic information system, such as ArcGIS or QGIS, can be created to make this method available to map makers. (4) Sample maps should be created to evaluate the method and the plug-in.

URLs and References
Balzer, M., Schlomer, T. and Deussen, O., 2009. Capacity-constrained point distributions: A variant of Lloyd’s method. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2006), ACM, 28 (3), article 86, 8 pages.
Ester, M., Kriegel, H., Sander, J. and Xu, X., 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. 226–231.

Pre- and Co-requisite Knowledge
Preferably Python


Implementing Trust Decisions in Distributed Computing
Supervisor: Carsten Rudolph

Background
Distributed computing models are omnipresent in modern service architectures, with an abundance of protocols that endeavour to attribute non-functional properties such as reliability, trust, or security to them. The concept of trust is fundamental to our usage and reliance on computers in our daily life. However, trust is often a matter of fact one has to accept rather than an informed decision of an individual to trust his computer to correctly calculate a desired result. In a distributed system, especially when the computers involved do not belong to a single individual or an individual can choose which computers should do their calculations, trust adds meaning to a result and becomes mission critical. The current body of research for secure distributed systems is focused on designing verifiably secure protocols to guarantee aforementioned properties for communication. Properties like trust and reliability can not be based on any particular protocol; a protocol can merely strive to render any meddling with communication between cooperating platforms ineffective. This project will work with definitions of trust in distributed systems and apply a novel formalism for reasoning about trust based on platforms and computations with and application popular scenarios for distributed systems.

Aims and Outline
The objective is building a tool demonstrating the feasibility of automating trust decision processes in future distributed computing scenarios.
You will be able to dive into latest concepts of trusted computing, understand them, and learn their applications in current and future systems, extend and apply your knowledge in distributed systems, and work on novel formalisms and techniques for reasoning with them.

You will be using programming tools like Scala (Java's, younger, smarter sibling) in combination with Akka (a recent framework for powerful reactive, concurrent, and distributed applications) for implementing distributed algorithms together with languages like Python with interfaces to logic programming for static reasoning. This project will allow you to broaden your knowledge in systems security and use your expertise as a software developer and computer scientist by bringing scientific reasoning and efficient use of security mechanisms to real world scenarios.

URLs and References
Position paper on Trust: https://pdfs.semanticscholar.org/e9e5/42fe723f74f8b8db4f8d9a400ee178dcdc9b.pdf
Formalisms for Trust Management: http://homes.cs.washington.edu/~pedrod/papers/iswc03.pdf

Pre- and Co-requisite Knowledge
This project requires basic knowledge on distributed systems/computer networks and IT security and good programming skills.


A Cyber Security Requirements Model for the Monash Micro-Grid
Supervisor: Carsten Rudolph and Ariel Liebman

Background
Microgrids in the formal definition of the U.S. Department of Energy are a group of interconnected loads and distributed energy sources (DERs) within defined electrical boundaries that act like a single controllable entities with respect to the grid. A microgrid can connect and disconnect from other, bigger grids, to enable it to operate both stand-alone or in grid connected modes. During disturbances, the generation and corresponding loads can separate from a distribution system to isolate a microgrids load from any disturbances without harming the grids integrity. The ability to operate stand-alone, or island-mode, has a potential to provide higher local reliability than that provided by a power system as a whole.

These microgrids need extensive attention from the computer security community in so as to make sure that not only during their design but also during their operation at cyber threads do not jeopardize requirements such as safety and reliability. In a broader context, bigger power networks in which microgrids are embedded need the same attention to make sure that the decoupling and integration of individual microgrids does not harm other connected grids. Monash is part of the research initiative towards smart microgrids and new energy technologies in collaboration with the Clean Energy Finance Corporation (CEFC)

"Monash University is intent on developing innovative solutions to the challenges in energy and climate change facing our world,” stated Monash University President and Vice-Chancellor Professor Margaret Gardner.

Aim and Outline
The goal of this project is to improve the understanding of security requirements of the future Monash electricity network. In order to develop this understanding, you will create a model of the network showing the main components and the processes within the network. Then, you will work with micro grid specialized to identify security requirements in terms of processes and data. The first result will be a formal or semi-formal model the provides a precise expression of security requirements on different levels. You will also be to explore suitability of approaches like business process modelling, formal modelling frameworks or more technical trust relation models to express security requirements of such an infrastructure. As part of the ongoing research innititive towards smart micro grids you will be providing a unique insight into cyber security related challenges w.r.t trust and security in smart grids. Finally, the model will be used to evaluate the impact of possible security solutions.

URLs and References
Community Energy Networks With Storage
http://link.springer.com/10.1007/978-981-287-652-2

S. Gürgens, P. Ochsenschläger, and C. Rudolph.
On a formal framework for security properties
International Computer Standards & Interface Journal (CSI), Special issue on formal methods, techniques and tools for secure and reliable applications, 2004.
http://sit.sit.fraunhofer.de/smv/publications/download/CSI-2004.pdf

N. Kuntze, C. Rudolph, M. Cupelli, J. Liu, and A. Monti.
Trust infrastructures for future energy networks (BibTeX).
In Power and Energy Society General Meeting - Power Systems Engineering in Challenging Times, 2010.
http://sit.sit.fraunhofer.de/smv/publications/download/PES2010.pdf

Pre- and Co-requisite Knowledge
The project is suitable for student with cyber security knowledge and a sound knowledge of computer networks.


Independent component analysis for identifying patterns of household energy consumption
Supervisors: Lachlan Andrew and Asef Nazari

Background
As we move towards greater reliance on renewable energy, there is a greater need to shift electricity load to times when the sun is shining and the wind is blowing. A step towards achieving this is to understand current energy usage patterns. One way to do this, without invasively monitoring hundreds of houses, is to apply statistical
pattern recognition techniques to large collections of houses.

Aim and Outline
This project will apply Independent Component Analysis (ICA), or alternatively non-negative matrix factorisation, to half-hourly electricity consumption data of thousands of houses to seek to identify components corresponding to tasks such as heating, cooling, and getting ready for work. The first steps will be to preprocess the data to reduce the computational burden, and to perform ICA. The next step will be to try to interpret the resulting components to infer the underlying causes of the energy consumption.

Pre- and Co-requisite Knowledge
The student should be comfortable with basic mathematics, including probability and linear algebra. The student will need to learn a language such as Matlab or R, or a package such as NumPy.


Deep Learning Methods in Wireless Communications
Supervisors: Amin Sakzad

Background
The problem of channel decoding of linear codes over  Additive White Gaussian Noisy (AWGN) channels using deep learning methods/techniques has been studied recently. This problem is also considered for low density parity check codes (LDPC codes) and medium to high density parity check codes (HDPC codes) [1,2,3,4]. Deep learning approaches are proved useful in multiple-input multiple-output (MIMO) channel decoding too [5].

Aim and Outline
This project aims at finding a low-complexity, close to optimal channel decoding of lattices and lattice/codes in wireless communications. We consider short length well-known lattices such as Barnes-Wall lattices and higher dimension ones including LDPC lattices, LDLC, LDA, and Turbo lattices (see [6] and references therein). We may try different approaches to tackle these problems. This includes applying the above mentioned techniques to the underlying label code of a lattice or applying the deep learning methods to the corresponding trellis representation of the lattice (see [7] and references therein).

URLs and References
[1] E. Nachmani, Y. Be’ery, and D. Burshtein, "Learning to decode linear codes using deep learning," in 54’th Annual Allerton Conf. On Communication, Control and Computing, September 2016, arXiv preprint arXiv:1607.04793.
[2] L. Lugosch and W. J. Gross, "Neural offset min-sum decoding," in 2017 IEEE International Symposium on Information Theory, June 2017, arXiv preprint arXiv:1701.05931.
[3] N. Farsad and A. Goldsmith, "Detection algorithms for communication systems using deep learning," arXiv preprint arXiv:1705.08044, 2017.
[4] L. Lugosch and W. J. Gross, "Neural offset min-sum decoding," in 2017 IEEE International Symposium on Information Theory, June 2017, arXiv preprint arXiv:1701.05931.
[5] N. Samuel, T. Diskin, and A. Wiesel, "Deep mimo detection," arXiv preprint arXiv:1706.01151, 2017.
[6] H. Khodaiemehr, M.-R. Sadeghi, and A. Sakzad, "Practical Encoder and Decoder for Power Constraint 1-level QC-LDPC Lattices," To appear in IEEE Trans. on Communications, DOI: 10.1109/TCOMM.2016.2633343.
[7] A.H. Banihashemi and F.R. Kschischang, "Tanner graphs for block codes and lattices: construction and complexity," IEEE Trans. Inform. Theory, vol. 47, pp. 822–834, 2001.

Pre- and Co-requisite Knowledge
Ability to write computer programs in Matlab, basic knowledge of deep learning and/or coding theory and/or lattices.


Integer-Forcing Linear Receivers for Multiple-Input Multiple-Output (MIMO) Channels
Supervisors: Amin Sakzad and Pierre Le Bodic

Background
A new architecture called integer-forcing (IF) linear receiver has been recently proposed for multiple-input multiple-output (MIMO) fading channels, wherein an appropriate integer linear combination of the received symbols has to be computed as a part of the decoding process [1]. Methods based on lattice basis reduction algorithms are proposed to obtain the integer coefficients for the IF receiver [2]. Connections between the proposed IF linear receivers and lattice reduction-aided MIMO detectors (with equivalent complexity) are also studied [2] The concept of unitary precoded integer-forcing (UPIF) is also introduced and investigated in [3].

Aim and Outline
This project has two folds: (1) The problem of finding suitable integer linear combination of the received symbols has only been addressed with respect to \ell_2 norm. This project aims at solving the mentioned problem with respect to \ell_1 norm. (2) The other problem of finding the best unitary precoder for integer-forcing is a min-max optimization problem that needs to be addressed too. Both these problems should be studied analytically and numerically using computer simulations.

URLs and References
[1] J. Zhan, B. Nazer, U. Erez, and M. Gastpar, "Integer-forcing linear receivers," IEEE Trans. Inf. Theory, vol. 60, no. 12, pp. 7661–7685, Dec. 2014.
[2] A. Sakzad, J. Harshan, and E. Viterbo, "Integer-forcing MIMO linear receivers based on lattice reduction," IEEE Trans. Wireless Commun., vol. 12, no. 10, pp. 4905–4915, Nov. 2013.
[3] A. Sakzad and E. Viterbo, "Full Diversity Unitary Precoded Integer-Forcing," IEEE Trans. Wireless Commun., vol. 14, no. 8, pp. 4316–4327, Aug. 2015.

Pre- and Co-requisite Knowledge
Digital Communication, Integer Programming, Matlab


Vietnam: Mobile Apps to support academic libraries
Supervisors: Tom Denison, Pari Delir Haghighi

Background
Academic libraries in Vietnam work with very limited resources compared to their Australian counterparts and have a real need for tools that assist in either communicating more directly with students or improving productivity and search interfaces. There are three potential projects on offer, each of which has a travel allowance that will enable a short field trip to either Hanoi or Ho Chi Minh City to assist with field work and further development.

Aim and Outline
There are three projects on offer:
1) A mobile app to support the collection of statistics and other data tracking interactions between librarians and students.
2) A mobile app to support library services to users.
3) An investigation into the uses of social media in supporting the student population.

Pre- and Co-requisite Knowledge
Projects 1 & 2: One of FIT2081, FIT3027 (Android), FIT4039 (Android), FIT5046
Project 3: Be undertaking the IKM stream within the MBIS. Units such as FIT5090 Social Informatics or FIT5105 an advantage but not essential.


Learning when to "sacrifice" in ultimate tic-tac-toe
Supervisors: Aldeida Aleti and Pierre Le Bodic

Background
Ultimate tic-tac-toe is a more sophisticated version of the well-known and slightly boring tic-tac-toe. Each square of the ultimate tic-tac-toe contains a similar but smaller board. In order to win a square in the main board, you have to win the small board inside it. But the most important rule  is that you don't pick which of the nine boards to play on; it is determined by your opponents previous move. The square she picks determines the board you have to play in next.

This makes the game harder, but more exciting. You cannot just focus on the immediate reward, you must also think ahead and consider future moves. It requires deductive reasoning, conditional thinking, and understanding of the geometric concept of similarity.

Aim and Outline
In this project, we will investigate efficient algorithms that solve the ultimate tic-tac-toe, with the main focus on learning moves that require "sacrificing" immediate reward in order to win the game.

URLs and References
http://ultimatetictactoe.creativitygames.net/

Pre- and Co-requisite Knowledge
Knowledge of algorithms and problem solving.


Web traffic analysis for understanding patient perceptions of pharmacotherapy for Rheumatoid Arthritis
Supervisors: Pari Delir Haghighi, Frada Burstein and Helen Keen (University of Western Australia)

Background
Rheumatoid Arthritis (RA) is a common, incurable disabling disease. Conventional synthetic disease modifying therapies (csDMARDs) have been the standard of care, but recent years, biologic therapies have been used increasingly. Costs are rapidly escalating due to reductions in csDMARD use: reasons for this are unclear, and may be patient driven. Understanding patient perceptions of pharmacotherapy may aid optimisation of csDMARD use.

Aim and Outline
The project will focus on web traffic analytics by exploring, examining and reviewing Google search results for a set of keywords related to Rheumatoid arthritis and its treatment (i.e.  Conventional DMARDs (cDMARD and  biologic DMARDs (bDMARDs) and biosimilars). It will assess the content available on these websites for accuracy, credibility and suitability.

Web traffic analytics will be used to generate a list of the most visited websites with regards to the above keywords. These will be stratified into two broad categories - government/organisation affiliated and those that are user-generated. The former will be subject to Sentiment Analysis to review the tone of the information provided to the patient, the latter will be reviewed by physicians with experience in the relevant fields for accuracy of information provided.

URLs and References
Marinez et al (2016) Patient Understanding of the Risks and Benefits of Biologic Therapies in Inflammatory Bowel Disease: Insights from a Large-scale Analysis of Social Media Platforms, Inflamm Bowel Dis. 2017 Jul;23(7):1057-1064

Pre- and Co-requisite Knowledge
Information management and data analysis skills are preferred.


Modelling the effect of autonomous vehicles on other road users
Supervisors: Dr John Betts (FIT) Prof. Hai L. Vu (Faculty of Engineering)

Background
Underpinned by emerging technologies, connected and autonomous vehicles (CAVs) are expected to introduce significant changes to driver behaviour as well as traffic flow dynamics, and traffic management systems.

Aim and Outline
This project aims to investigate the impact of connected and autonomous vehicles on traffic flows and evaluate new possibilities for efficiently managing traffic on future urban road networks.

In this project, students will explore and evaluate the impact of this disruptive technology by implementing and integrating new car-following models into an existing traffic simulation to study the behaviour of CAVs and their interaction with other vehicles and their drivers.

Prerequisite Knowledge
Good programming skills in any modern programming language. Some modelling and simulation experience would be advantageous.


Decision models for managing large crowds
Supervisors: Dr John Betts (FIT) Prof. Hai L. Vu (Faculty of Engineering)

Background
As the populations grow, and urbanization increases, large crowds of people or pedestrians are becoming the norm in major cities. Large crowds are also often formed when there is a sporting, entertainment, cultural or religious event. It is important to plan and develop strategies for such large crowds in order to efficiently manage people and maintain a safe situation.

Aim and Outline
This project aims to develop a simulation tool that can assist with timely decisions and resource allocation in the emergency management of large crowds within an urban setting.

In this project, students will explore models and their implementation using agent-based simulation to simulate pedestrian behaviour and to develop crowd management strategies for large crowds.

Prerequisite Knowledge
Good programming skills in any modern programming language. Some modelling and simulation experience would be advantageous.


The Future City: modelling the impact of disruptive technologies
Supervisors: Dr John Betts (FIT) Prof. Hai L. Vu (Faculty of Engineering)

Background, Aim and Outline
Future cities are smart but full of surprises. In this project, students will explore an open source game engine (simcity.com) to build and model a future city. The focus will be in linking this open source software with another open source agent-based software (matsim.org) to evaluate the changes in society due to the emergence of disruptive technologies. For example, how are people’s transportation habits affected by the emergence of driverless cars or shared mobility?

References
http://www.simcity.com/
http://www.matsim.org/

Prerequisite Knowledge
Good programming skills in any modern programming language. Some modelling and simulation experience would be advantageous.