Graduate Research Projects

The Faculty of Information Technology is currently offering a number of graduate research projects and scholarship opportunities. We are seeking eligible applicants to undertake research projects with our research flagships, centres and labs.

Applications from high calibre candidates are welcome at any time. Please note however that applications for these projects and scholarships close on either 31st May or October 31st (unless otherwise stated).

Prospective Indian students interested in studying at the IITB-Monash Research Academy, click here to view those projects.


Key Description
Projects already have some funding available. Either tuition fee waivers (either full or partial), or stipends, or both.  Contact the supervisor for more information.
Projects are associated with NICTA; PhD Candidates undertaking NICTA affiliated research are eligible to apply for a Monash NICTA VRL Stipend Scholarship
Funding of PhD Candidates might become available, but has yet to be confirmed.
MIGR Monash Institute of Graduate Research (MIGR) - Candidate applies for MIGR Scholarship.

 

Computational Biology

Funding Project Title Supervisors

A Computational Model for Enhancing Diagnostic Ultrasound

We propose that an interdisciplinary partnership between the DMIRS and faculty of IT be formed in order that a computational model of the experimental ultrasound configuration be developed to help interpret the measured data. The model will incorporate the necessary physics required to propagate the ultrasound beam into tissue and then determine the subsequent energy absorption and localised heating. This model will be validated against experimentally acquired data. The model will then be used to predict how sophisticated analysis routines can be used to improve image quality in the field of ultrasound.

The project will be supervised by Dr Matthew Dimmock (DMIRS) and Dr David Albrecht (Machine Learning). Dr Dimmock and Dr Albrecht have strong track records in medical imaging and computer modelling [5-9]. Dr Dimmock and Dr Albrecht are both currently active researchers and PhD candidate supervisors at Monash.

[1] S.H. Contreras Ortiz et al. (2012), Bio. Sig. Proc. Cont. Vol. 7 p. 419 . [2] P.A. Brennan et al. (2012), Brit. Jour. Oral Max. Surg., Vol. 50, p. 333. [3] T. Misaridis et al. (2005), IEEE Trans. Ultras. Ferro. Freq. Control, Vol. 52, p. 177. [4] I. Despotovic et al. (2008), Ann. Works. Circ. Syst. Sig. Proc. [5] J.M.C. Brown et al. (2014), Nucl. Instr. Meths. B, Vol. 338, p. 77. [6] M.R. Dimmock et al. (2012), IEEE Trans. Nucl. Sci. Vol. 59, p. 1738. [7] M.R. Dimmock et al. (2009), Ncl. Instr. Meths. A, Vol. 612, p. 133. [8] T.Lwin and D. Albrecht (2014), Jour. Stat. Plan. and Inf., Vol 146, p. 70. [9] M. Byrne et al. (2008), Jour. Comp. Civ. Eng., Vol. 22, p. 90.

Collaborative Creation and Analysis of SBGN Maps and Related Information

Work in the life sciences is increasingly collaborative and depends more and more on standards for knowledge representation and exchange. The Systems Biology Graphical Notation (SBGN) is a standard for the visual representation of cellular processes and biological networks. In addition to represent knowledge about these processes and networks it is used to provide a framework for the integration and analysis of high-throughput data from biological or medical experiments such as genomics, proteomics and metabolomics data. This PhD project aims to develop novel methods for collaborative work with SBGN maps. The major task will be the development of algorithms and methods to support collaborative visualisation and exploration of SBGN maps and related information from experiments. Different types of collaboration should be investigated: local collaboration at one location as well as distant collaboration at different places. The project will employ novel technologies for collaborative work currently under development between the Clayton and the Caulfield campus. The expected outcome will be algorithms, methods, and software which help collaborative work, as well as a better understanding the use of collaborative technologies for the biological sciences.

Designing thermostable and aggregation-resistant proteins

An INTERDISCIPLINARY PHD PROJECT funded by Monash Computational - Biomedical PhD Scholarship

Protein thermostability greatly affects useability as a therapeutic drug or industrial agent. A more stable protein will have longer shelf life and have less technical requirements for storage. A more stable protein will also remain efficacious in situ for longer periods of time. Consensus design has shown that it is possible to use evidence from the proteins developed through evolution to improve thermostability. Using state-of-the-art data analytics techniques we have developed De novo Protein Design, a novel technology for protein stabilisation that we anticipate will achieve the stabilisation of consensus design while retaining full functionality. This project will conduct laboratory experiments to characterise the structure and thermostability of novel proteins created by De novo Protein Design.

Development of big data-driven bioinformatics approaches and tools for cost-effective identification of potential post-translational modification types and sites in the human proteome

Through efficient knowledge discovery and machine learning of currently available large-scale experimental PTM data, this joint PhD proposal aims to develop and deliver machine learning approaches and bioinformatics tools that can be efficiently used to detect all potential functional PTM types and sites likely to occur in the whole human proteome.

  • Geoff Webb
  • James Whisstock (SOBS)
  • Jiangning Song (Monash Bioinformatics Platform, SOBS)

Exploration of big data enabled pervasive visualisation method for sustainable urbanization

This PhD project aims to develop novel visualisation and interaction methods enabled by big data and pervasive computing technology to better allow researchers and/or end users to explore the model and to investigate different approaches of the urban informatics. This study will include exploring the technological features, huge potentials and inseparable relevance to people’s everyday life through developing a specific needs focused sample case study, such as a ‘Next-G portable navigation system’, which targets at significantly increase the eco-efficiency of urban passenger transportation. The major task will be the development of algorithms and methods to interactively visualise the model in 2D and 3D, and to provide methods for developers to engage with the model in an immersive way. The project should utilise novel visualisation environments such as the RealSense, ZSpace, CAVE2 and large displays. The expected outcome will be algorithms, methods, and a VR/AR environment which helps investigating the interactive solutions as well as further models of information visualisations.

Today, more than half the world’s population live in cities, and the UN expect this figure to rise to two-thirds by 2050. Already, cities are responsible for an estimated 70% of global energy use, a figure that can only rise with further urbanization. With such rapid urbanisation, enabling modern cities to deliver services effectively, efficiently and sustainably while keeping citizens safe, healthy, prosperous and well informed is among the century’s most important undertakings. Big data empowered by cloud computing and mobile devices makes it possible for researchers to tackle the grand challenges of urban informatics. This provides great potentials for cities to cut their energy use in order to reduce greenhouse gas emissions, the Urban Heat Island effect and urban air pollution, and global fossil fuel (especially oil) depletion. Available approaches include shifting to alternative energy, energy efficiency improvements, and energy conservation. We find that although both increased use of renewable energy and energy efficiency improvements are desirable, they will be unable to reduce significantly fossil fuel use by 2050. This is important because many researchers argue that large reductions in oil, and to a lesser extent other fossil fuels, will be needed within the next two decades, because of their mounting energy, monetary and environmental costs.

Exploration of pervasive visualization method for spinal movement

In Australia, a 2007 study by Access Economics estimated back pain was costing the economy A$34.3 billion in lost workplace productivity and treatments. In the UK, lower back pain has been identified as the most common cause of disability in young adults, with more than 100 million workdays lost per year. While the emphasis in OH&S is to take preventative measures, such as keeping chairs, desks and computers at the right height, it is not possible yet for office furniture to tell the worker whether they are sitting correctly, when it’s time to get up and move around.

This PhD project aims to develop novel visualisation and interaction methods enabled by pervasive computing technology to better allow researchers and/or end users to explore the model and to investigate different scenarios of the spine situation and behaviour when sitting on the chair. The major task will be the development of algorithms and methods to interactively visualise the model in 2D and 3D, and to provide methods for developers to engage with the model in an immersive way. The project should utilise novel visualisation environments such as the CAVE2 and large displays. The expected outcome will be algorithms, methods, and a virtual environment which helps investigating the model and a better understanding of the use of immersive technologies for in-depth understanding of spinal health.

 

Genomics of Disease

The cost effective next generation nucleotide sequencing technologies provide researchers with genomic data with remarkable precision, at unprecedented rates. For the first time in human history, this has resulted in an ability to investigate at high resolution the molecular mechanisms driving diseases such as cancer, among other heritable diseases. The focus of this project will be to identify the mutations and abberations in the host genome that drive the disease manifestation and the underlying evolutionary mechanisms responsible for the progression, mtastasis (in case of Cancer), and relapse during the life span of the disease.

Immersive Analytics for the Virtual Cell

Recently the first complete model of a simple cell has been published (Karr et al., Cell 150(2): 389–401, 2012). It describes the organism M. genitalium and consists of several submodels for specific biological processes. Although the model can be simulated and used for predictions, it is difficult to understand due to its size and complexity. This PhD project aims to develop novel visualisation and interaction methods to better allow users to explore the model and to investigate different scenarios of the behaviour of the organism when changing model parameters. The major task will be the development of algorithms and methods to interactively visualise the model in 2D and 3D, and to provide methods for users to engage with the model in an immersive way. The project should utilise novel visualisation environments such as the CAVE2 and large displays. The expected outcome will be algorithms, methods, and a virtual environment which helps investigating the model as well as further models of other organisms and a better understanding of the use of immersive technologies for the biological sciences.

Integration of genomic data from diverse data streams

This cross-disciplinary PhD project is a unique opportunity to join the fast moving merge between biology and computer science. The project will explore the integration of diverse genomic data and experimental types. This is surprisingly complex, for each dataset varies drastically in quality, consistency and completeness. There are technical problems joining data sets with differing and evolving naming conventions, and more fundamental problems when joining data sets from different strains or species. Amongst the most reliable datasets are those that are derived from Bakers Yeast, a model organism used extensively to uncover basic eukaryotic biology. In the Beiharz lab, this organism is utilised to understand links between transcriptional and post-transcriptional regulatory control. This means that we probe which subsets of RNA are expressed under changing conditions and then integrate this with data collected by other studies across the world.

Complementing this work, Konagurthu lab specializes in computation biology and bioinformatics handling macromolecular data, with a strong focus on statistical inductive inference and applied algorithms. The collaboration between these two teams will ensure that the best contemporary computational research is used to advance the biological understanding of these diverse data streams.

This project will suit an applicant with a strong background in computer science and statistics, in addition to having a curiosity to explore the amazing (but messy) complexity of life on earth.

Investigating and visualising cross-talk between biological processes

Cancer is a very complex disease, with widely varying degrees of successful treatment. There is increasing appreciation that there is cross-talk between cancer progression and the immune system. This is, in turn, driving the use of powerful computational and mathematical approaches, to complement laboratory work in understanding and exploiting the role of the immune system in cancer progression, prognosis and development of more effective therapies. However, while building a single model to capture all these processes at once is highly desirable, it is also extremely costly.

This PhD project will look at how potential cross-talk could be mapped between two processes, as an effective and less costly experimental stage. This involves modelling of different processes by mathematical methods such as Boolean networks, deriving mappings between different processes as well as investigating the effect of these mappings onto the overall network, and developing novel methods for visualising this information. The aim is to investigate the use of visualisation approaches integrated with the computational modelling to obtain a better understanding complex biological and disease processes. This project is suited for a student with very good IT, mathematical and programming skills and strong interest in biology. The project includes collaboration with clinical and pharmaceutical industry partners.

Modelling and visual analytics of Pseudomonas aeruginosa metabolism

The bacterium Pseudomonas aeruginosa is one of the superbugs that can cause deathly disease in humans and is difficult to treat. Therefore, a better understanding of the major biological processes of this bacterium is essential to be able to find or develop new drugs. Metabolism (i.e, the chemical processes that occur within an organism is order to maintain life) plays a central role in this investigation.

This joint PhD project aims to reconstruct the metabolic network of Pseudomonas aeruginosa as a stoichiometric model. Different methods from computer sciences such as topological analysis, Petri net modelling, and flux Balance Analysis will be employed to simulate and understand the metabolic network of this strain. The project focuses on two aspects: 1. To help in better understanding of the complex processes, novel ways to interactively visualise network, network alterations, and simulation results should be developed. 2. To produce new hypotheses regarding potential new targets for fighting against this bacterium, alterations of the network (e.g., by introducing new compounds or by targeted disruption of the network) should be studied.

This project is suited for a student with very good mathematical skills, computer science, and programming skills and strong interest in integrative systems biology.

Modelling of the developmental cardiac gene regulatory network

Heart disease is the #1 killer in Australia. To date, 1 in 100 babies are born with a heart defect. To be able to treat heart disease, it is essential to understand how the heart forms at the first place. Proper heart formation is orchestrated by the expression of cardiac genes at the right time (temporal) and the right place (spatial) in the developing embryo. Therefore the understanding of the spatial-temporal cardiac gene regulation is highly important for identifying the genetic origin of heart disease, which can then be targeted for further drug development.

This PhD project aims to reconstruct the network of genes which control heart development using mathematical methods for network modelling. Quantitative predictions by the model will be directly assessed using molecular biology tools in the zebrafish, a well-described model for studying heart development in vivo. Interactive visual representations will be developed to support a better understanding of the processes involved and to help predict important interactions that could cause heart disease if altered.

This project is suited for a student with very good IT and mathematical skills and strong interest in developmental biology. Programming skills are essential, wet lab experimental skills will be acquired during the project.

MIGR

Next-generation protein structural comparison using information theory

Protein structural comparison is the most fundamental and widely used task in protein science. This research will develop the next-generation methods for structural comparison based on a rigorous information-theoretic framework. The results will have direct payoffs to the fields of structural biology and crystallography.

Sequence-Structure Determinants of the Protein Repertoire

Underlying the great variety of protein structures are many recurrent patterns, analogous to LEGO pieces. This research will discover a comprehensive dictionary of structural building blocks, their assembly rules and sequence-structure determinants. The outcomes of this research will impact, among other areas of structural bioinformatics, the field of computational protein structure prediction.

Simulating bee foraging: how behavioural diversity in bees interacts with environmental conditions

Recent investigations on pollinator-plant interactions show that the learnt flower preferences of important pollinators like bees is dependent upon both flower temperature, and regional ambient temperatures. This shows that local and global changes in climatic conditions may directly influence how certain plants are pollinated. This project is producing computer simulations to reveal how climate change may directly influence flower evolution in the future, and how the management of environmentally and economically important plants can be modelled to inform reliable decision making about this important resource.

Understanding brain network dynamics and topology in its anatomical context

Brain network data is extrapolated from observed co-occurrences of activity at various brain regions. In studying this data, both network topology (that is, the connectivity amongst brain regions indicated by high edge weights) and the anatomical context (that is, the physical situation of the end-points of each edge within the brain) are important.

Thus, we need to develop new methodologies for interactive visualization to display the data in ways that show the anatomical situation of the network data within a 3D brain model but that can also optimally unravel the network to provide a better view of network topology.

This is an interdisciplinary project in that the PhD candidate will be supported by Monash Medical Imaging in understanding the clinical application while the Faculty of IT will assist with expertise in computer graphics, data analysis and visualisation.

Visualisation of expressive extensions of the Gene Ontology

The Gene Ontology (GO) (Ashburne et al 2000) is one of the most widely used biomedical ontologies. It comprehensively describes different aspects of genes and gene products under three broad categories, (1) biological process, (2) cellular components and (3) molecular functions. GO is currently being actively extended in the LEGO project to create a new annotation system to provide more extensible and expressive annotations.

This project aims to investigating novel visualisation techniques for GO-LEGO and its integration with the Systems Biology Graphical Notation (SBGN). The information can be seen as networks, and although there are some solutions for network layout none is particularly tailered towards this application. Specifically we will investigate how domain knowledge captured in ontologies can assist in creating high-quality layouts.

[1] Ashburner, Michael, et al. "Gene Ontology: tool for the unification of biology."Nature Genetics 25.1 (2000): 25-29. [2] Le Novère et al. “The Systems Biology Graphical Notation” Nature Biotechnology 27 (2009): 735-741.

 

Data Systems and Cybersecurity

To be advised. For more information see Data Systems and Cybersecurity.

 

Immersive Analytics

Funding Project Title Supervisors

Effectiveness of Tactile Diagrams

Everyone knows the saying that “A picture is worth 10,000 words.” But what about if you are blind? Tactile graphics, sometimes called tangible graphics, are currently the best way of providing access to graphical material for those who are blind. In use since the 18th century, they are a graphical analogue to braille. They use objects with different textures and heights and are read with the hands.

Research has shown that touch and/or sound allow blind people to build internal spatial representations that are functionally equivalent to those obtained from visual input [1]. Tactile maps have been demonstrated to help blind children and adults to build a survey-like representation of their environment, thus facilitating navigation [2].

However there has been much less research into the benefits of tactile graphics for other kinds of information graphics such as flow charts or diagrams of mechanical devices. The aim of this project will be to conduct user studies with blind people to try and identify what if any are the benefits of tactile diagrams and from this develop guidelines for presentation of information using tactile diagrams.

[1] Z. Cattaneo et al. Imagery and spatial processes in blindness and visual impairment. Neuroscience and Biobehavioral Reviews 32(8):1346--1360, 2008. [2] S. Ungar. Cognitive mapping without visual experience. In Cognitive mapping: Past, present and future. Routledge, 2000.

Exploring Learning Analytics Visualisation Techniques for Better Understanding of Student Learning Behaviours

The current trend for on campus students to spend less time in the classroom and more time online, combined with the increasing popularity of MOOCS, has led to the need to find alternate ways for academics to support students and identify those at risk. Similarly, new ways need to be found to allow students to better engage with their peers and understand their place in a subject cohort. This project seeks to use student engagement data, captured and analysed using learning analytics, in the exploration of information visualisation techniques that can provide both academics and students with better understandings of student academic performance and learning styles. The outcomes of the project will be; a) an understanding of the learning analytics data available to create multi-dimensional profiles of student engagement behaviour, and b) prototype visualisations of student profiles showing academic performance, learning styles and relationships to fellow students. Findings will be vital for the further development of online learning systems as well as providing support for academics and students in a changing education environment.

 

IT for Resilient Communities

Funding Project Title Supervisors
MIGR

Crowd Intelligence to enhance information architecture for Smart Health Information Portals: optimise health outcomes for the Autism Community

This project is part of a larger program of research relating to the Community Health Informatics theme in the IT for Resilient Communities Flagship. It will aim to utilise crowd intelligence to deliver relevant and personalised healthcare information within a quality framework, to the autism community and professional sector. Crowd intelligence or "crowdsourcing" has been recognised as one of the most efficient means of collecting data and information from a collective who share the same or similar life experiences. There is a need to study how this data can help in meeting the health information needs of on-line information consumers. The project will contribute to the development of an autism specific on-line Health Information Portal (HIP) and Mobile Autism Health Application (MAHA) platform, which are currently under investigation in collaboration with medical practitioners and Autism Advocacy group.

The key finding of this study will contribute to the improvement in the provision of autism specific medical information, and more broadly contribute to the design of methods and architectures for Smart Health Informaiton Portals.

The successful candidate will have an excellent academic track record in areas related to information systems research, particularly in relation to the web-based information provision. Strong IT skills, web design, mobile applications development and/or experience of working with communities would be highly advantageous.

Dynamic Descriptive Interfaces for Participatory Community Archival Networks

Recordkeeping and archiving are fundamental infrastructural components supporting community information, self-knowledge and memory needs. If developed in community-driven ways, they can contribute to resilient communities and cultures and pan- or trans-community endeavours, whereas traditional institutional approaches can often lead to community disconnection and disempowerment. Currently the archival records relating to many communities are fragmented, dispersed and dysfunctional. Spread across archival institutions, they appear as unmanaged, invisible and inaccessible from a community-centred perspective. Traditional archival description and access frameworks have focused on visibility and accessibility for scholars and researchers, causing many community-driven archives to resist the collecting activities of institutions wary of losing control and access to their records.

The Resilient Cultures Theme of the IT for Resilient Communities Flagship Research Program addresses R&D challenges relating to:

  • developing systems that capture, integrate, preserve, and make available community knowledge, and preserve cultural heritage
  • using innovative and leading edge technologies to build multimedia systems for visualising and animating culture, archiving oral memory, and helping communities to work with government and institutions on their own terms.

This project focuses on the metadata needs associated with:

  • building Sustainable Living Archives enabling long-term preservation, cross-generational transfer and interactive use of community knowledge, memory and culture
  • developing Information and Memory Infrastructure to support resilient communities and community-based scholarship
  • innovative use of IT to visualise and document cultural heritage.

The successful candidate will have an excellent academic track record in archival science. Strong IT skills and experience of working with communities would be highly advantageous.

Intelligent Ontology Engineering for Multi-disciplinary Applications

A domain ontology rigorously defines concepts and relationships among these as it attempts to draw up a complete representation of knowledge of the domain. While this is well-suited to uni-disciplinary applications of the ontology, it tends to limit application and becomes inadequate for interdisciplinary and multidisciplinary problems. For instance a medical ontology on Diabetes Mellitus could be effortlessly used for medical diagnosis applications whereas attempting to apply the same to an interdisciplinary endeavour such as Diabetes awareness (participatory medicine) or nutrition management (dietetics) will be unsuccessful. This project aims to fill the void of engineering an application-oriented ontology from a domain ontology for interdisciplinary and multidisciplinary problems. Not limited to the union of two (or multiple) ontologies, this requires engineering complexities surround the selection of relevant concepts from each discipline, the integrating relationships and generation of new hybrid concepts. Further complications arise in certain scenarios (frequently when engineering a user-focused ontology) where a mature ontology is unavailable and the primary source of information is a constrained vocabulary/glossary or lack thereof; novelty is necessary to derive the relevant concepts from such disciplines. CFinder is a key concept extraction technique that extracts noun phrases using their linguistic patterns as candidates for ontology development from a text corpus in the target domain. The project will use CFinder as the basis to develop a new approach for the development of application-oriented ontologies.

PROTIC (Participatory Research and Ownership with Technology, Information and Change) - Resilient Communities Bangladesh Scholarship. [Download the Expression of Interest form or the Scholarship details]

PROTIC (Participatory Research and Ownership with Technology, Information and Change) is an innovative five-year mobile communications international development project. It works in regional Bangladesh trialling innovative information and communications strategies, though participatory action research. PROTIC has a focus on empowering women though enhancing community well-being and livelihoods in poor agricultural communities. Community involvement, particularly women is critical to the project in determining their information needs and priority in conjunction with experts, rather than experts deciding what is important. PROTIC is funded from a partnership between the Faculty’s Centre for Community and Social Informatics (COSI), and Oxfam- Bangladesh and supported by a $3.68 million donation from the Empowerment Charitable Trust.

 

Machine Learning

Funding Project Title Supervisors

Analysing social media text

Over the last few years, there has been a growing public and enterprise interest in 'social media' and their role in modern society. At the heart of this interest is the ability for users to create and share content via a variety of platforms such as blogs, micro-blogs, collaborative wikis, multimedia sharing sites, and social networking sites. The unprecedented volume and variety of user-generated content as well as the user interaction network constitute new opportunities for understanding social behavior and building socially intelligent systems. In this project, we employ techniques from Natural Language Processing and Machine Learning to analyze various types of social media text.

Data mining for medical informatics

Medical informatics is a major growth area within the medical community with the combination of more sophisticated record keeping and the emergence of evidence-based medicine as a best-practice methodology. Alfred Health is a leading proponent in Australia with a number of innovative, integrated data collections ideal for further analysis. Opportunities to understand and affect decision making and standards of privacy mean that developments in medical informatics are best done in situ. The Machine Learning flagship/research programme at the Faculty of IT in Monash works jointly with Alfred Health in this area.

There are a number of opportunities within the joint programme for projects. Data includes biochemistry, hemotology, and microbiology tests, notes from medical staff and sanitised medical and lifestyle history, as well as a variety of scans. Thus a combination of structured, fielded, numeric and text data is available, depending on the project undertaken. Problems to be addressed might be to flag potential "at risk" patients, to support routine records analysis, or to develop "scores" for consideration by practitioners, all leading to improved patient health outcomes.

 

Deep Learning for Text/Language Processing

Deep learning (rebranding of artificial neural networks) attempts to model high-level abstractions in data by using model architectures composed of multiple non-linear transformations. Deep learning techniques have enjoyed tremendous success in the speech and language processing community in recent years (especially since 2011), establishing new state-of-the-art performance in speech recognition, language modelling, and some natural language processing tasks.

The focus of this PhD project is on deep learning approaches to problems in language or text processing. The project will explore various deep learning architectures (convolutional neural networks, recurrent neural networks, etc) to advance state-of-the-art in real life applications. Example problems are those in spoken language understanding (SLU), machine translation (MT), social media analysis, and semantic information retrieval (IR).

Hierarchical models of semantics in text

Medical informatics provides us with a unique capability to analyse text semantically in that there are well developed tools for annotating text with concepts from abstract hierarchies such as SNOMED CT. Probabilistic tools for developing models based on these hierarchies are more of a challenge. Topic models, text classification, and document summarisation are all areas that could benefit by somehow incorporating hierarchical semantic tags, though clearly the problem will require different statistical techniques. The theory of Dirichlet trees provides a basic scheme for addressing these problems, yet these are strictly hierarchical whereas language hierarchies have limited cross-links making them tree-like.

This project is to develop computational statistical models capable of handling the non-tree language hierarchies, working with medical text, natural language tools, and statistical models such as topic models. This will result in better software tools to enable statistical/semantic processing of medical text.

Improved Statistical Models of Document Corpora for Text Retrieval

At the heart of modern Web search systems lie relatively simple non-linear functions that rank documents by their similarity to query keywords. Despite the criticality of these text retrieval functions, our theoretical understanding of them is somewhat limited in comparison to that of other non-linear models employed in Machine Learning. The BM25 function for instance, arguably the most effective text retrieval function, is based on a relatively ad-hoc combination of empirically derived heuristics, with little in the way of clear theoretical underpinning. The aim of this project is to develop the missing theory through the application of sophisticated statistical techniques to the modelling of text documents from a large Web crawl corpus.

Recent advances in Language Modeling have shown that hierarchical non-parametric models such as Pitman-Yor Processes [3] can be used to appropriately model power-law behaviour in natural language. This behaviour exhibits itself as “word burstiness”, whereby occurrences of the same term are not evenly distributed across the corpus but rather tend to co-occur in a small number of documents. Correctly modelling burstiness is important for accurate retrieval since it determines the amount of information that term counts carry with respect to the probable meaning of each document. This project will investigate advanced statistical models capable of modelling observed variations in burstiness across terms in a document corpus. The investigation will result in better theoretical understanding of textual data and improved Web search algorithms.

Latent Variable Models of User-Click Behaviour with Application to Personalized Web Search

The project will involve the analysis of advanced topic modelling techniques for estimating user click probabilities in query logs. This will result in improved algorithms for personalizing Web search results.

Many models have been proposed in the past for personalizing retrieval based on the profile information present in a query click-through log. None have been particularly successful except for dealing with the "obvious cases" where the user has already submitted the exact same query in the past. We believe the reason is that they were not based on principled statistical modelling of user interests and information needs. Latent topic modelling techniques are now mature and offer the possibility to model these interests and needs in a principled manner. In this project we will investigate new generative models for describing users’ querying and subsequent click behaviour, and evaluate the effectiveness of these models for the personalization of Web search results.

Learning Dynamic Object-Oriented Bayesian Networks

Object-oriented Bayesian networks (OOBNs), properly defined, include class hierarchies of types of subnetworks, abstraction from the internal details of subnetwork structure, and partial inheritance of parameters, internal structure and external structural connections (Hoang, 2013). The main point of defining OOBNs and providing programmatic support for their use in BN modeling tools is to simplify the generation and testing of models in large-scale problems with repeating subnets, such as in GIS applications. Hoang (2013) has begun this work, but it is incomplete.

GIS applications very often involve not just replications of subnets over space, but also over time, with the future states of subnets heavily dependent upon their immediate past (as in Markov chains). Dynamic Bayesian nets (DBNs) were invented for modeling just such dynamic processes. The combination of dynamics with object-orientation, dynamic OOBNs or DOOBNs, is a recent idea (e.g.,Wang et al., 2012), which can only be brought to fruition with proper OOBNs, as we are only just now developing. Potential applications include the environmental sciences, climate change, meteorological modeling, ecology, population biology, epidemiology and economics.

Despite the compactness of OOBN representations, the modeling problems for developing full-scale OOBNs or DOOBNs for these domains can be very large. Automated learning of DBNs, OOBNs and finally DOOBNs will likely be crucial to wide-spread adoption of the technology. Thus far we have developed a demonstrably world-class DBN learner, by adapting the MML score for Bayesian nets in the causal discovery program CaMML to properly score the repeating structures of dynamic BNs (Black et al., 2014). This project will first adapt CaMML to learning the repeating structures of OOBNs, replicating subnets over space with minor variations, and then combine the two methods to the learning of DOOBNs, replicating subnets over both space and time, with variations. MML, by taking advantage of compression in joint coding of data and models, is ideally suited to learning replicated structures, so we can expect at least as much success in dealing with DOOBNs as we have had with learning DBNs.

References

A. Black, K.B. Korb and A.E. Nicholson (2014) Intrinsic learning of dynamic Bayesian networks. ACM Transactions on Intelligent Systems and Technology, submitted.

T.X. Hoang (2013) Inheritance in Object Oriented Bayesian Networks. Honours Thesis, Monash University.

R. Wang, L. Ma, C. Yan and J. Mathew (2012) Preliminary Study on Bridge Health Prediction Using Dynamic Objective Oriented Bayesian Network (DOOBN). In Engineering Asset Management and Infrastructure Sustainability, pp 1027-1042. Springer.

Learning from big data

As data quantities continue to rapidly grow there is ever increasing demand for efficient and effective machine learning techniques for analysing very large datasets. This project is based on the hypothesis that large data quantity calls for quite different types of learning algorithms to small data quantity. Specifically, there is less need to control learning variance and greater need to minimise learning bias. This project will develop computationally efficient low bias learning algorithms suited for effective learning from big data.

Non-parametric Bayesian Models for Text

Increasingly, non-parametric Bayesian statistical modelling is being used to model the complex, structured aspects of text: grammar, part of speech, semantic content, etc. This is especially useful in semi-supervised contexts, where only limited tagging is available. This project will apply recent non-parametric methods to achieve semi-supervised learning in some text problem.

Learning Latent Variable Causal Models

Linear causal modeling is a long-established, highly productive tool in the psychological, social and biological sciences, beginning with the work of geneticist Sewall Wright in the 1920s. Interest in modeling with latent variables in linear models goes back even further, beginning with Spearman's work on intelligence in 1904. Latent variables help deal with the influence of causal factors which are not directly measured. Many variables (such as intelligence) cannot be directly measured; many others may be, but we don't know about them at the time when we are making observations (as with Galileo and gravity). Regardless, at least in many cases, the influence of unobserved variables is real and important, and proper modeling cannot be done without them.

Automated methods of learning non-linear causal models have been a major area of research for 20 years, since the ground-breaking work of Pearl (1988) on Bayesian networks. Research on how to integrate the learning of latent variables in the learning of causal models has been fragmented and incomplete. Nir Friedman has a number of papers on the subject, generalizing an Expectation Maximization iterative process (Friedman, 1997) for coping with missing values to one which copes with all values of a variable missing (i.e., latency); this process requires knowing or guessing where hidden variables may be in the model and seeding the learning process accordingly. Spirtes, Glymour and Scheines (2000) describe various learning algorithms that include tests for the possible presence of latent variables, using sequences of local significance tests to construct causal models.

In this project we will explore adding latent variable discovery in an integral, non-ad hoc way to the learning process of a well-established search-and-score causal discovery program, Causal discovery via MML (CaMML). CaMML has been successfully applied to many different discovery problems, but not yet latent variable models. This will build on preliminary work in preprocessing for CaMML (Zhang, et al., 2014), identifying probabilistic dependency structures that are best explained with latent variables in a small subnet. This information will be used by CaMML to explore and score an enlarged model space, returning the best causal model for explaining observed data, whether or not that best model includes latent variables.

References
N. Friedman (1997) Learning belief networks in the presence of missing values and hidden variables. Int Conf on Machine Learning, 125-133.
Korb & Nicholson (2011). Bayesian Artificial Intelligence.
J. Pearl (1988) Probabilistic Reasoning in Intelligence Systems. Morgan Kaufmann.
Spirtes, Glymour and Scheines (2000). Causation, Prediction and Search, 2nd ed. MIT Press.
Xuhui Zhang, Kevin B. Korb, Ann E. Nicholson and Steven Mascaro (2014). Dependency Patterns for Latent Variable Discovery. Euro Conf on Machine Learning, submitted.

Scalable Semi-supervised Learning for Structured Prediction

"Semi-supervised learning" and "Structured prediction" are two major challenges in statistical machine learning. We propose to tackle these two challenging problems together, because their simultaneous solution is critical in many real-world problems, including those in the analysis of text, image, audio, and biological sequences. As the data quantities in these applications are growing rapidly, it is critical that the algorithms be highly scalable. This project deals with scalable probabilistic models for structured prediction, which can be trained using labeled and un-labeled data (aka the semi-supervised learning scenario).

Scientific visualization and analysis of simulated epidemiological data

Computational simulation models are an increasingly important tool for exploring and understanding patterns of health and disease in human populations. The output of these models is complex and multidimensional, and requires innovative approaches to data analysis, visualisation and interpretation.

An APA equivalent scholarship or top-up for a scholarship holder is being offered for a PhD research candidate who will focus on data analysis and visualisation of epidemiological agent-based simulations. The new visualisation approaches will integrate information on population structure, social networks, spatial location and disease state to provide novel insights into the links between population structure and disease transmission.

This is a joint project of the Clayton School of Information Technology at Monash University and the School of Population Health at the University of Melbourne. The project also involves collaboration with social network researchers in the School of Psychology at the University of Melbourne, and the emerging e-research platform being developed by the Australian Urban Research Infrastructure Network.

Prior experience in information visualisation, scientific visualisation, data analysis, human-computer interaction, computer graphics or geographic information systems would be advantageous. However, we encourage all well-qualified candidates (MSc or good Honours degree) with a strong interest in this research area to apply.

Statistical models of documents and text to support understanding

Various models of documents and text have been proposed that address different aspects of the content: the linguistic content (natural language and named entities), the document structure (sections, paragraphs), the topical content (issues, themes), and the argumentation and affective content (sentiment). Probabilistic models using latent variables give state of the art for some of these aspects, and for others they are near state of the art. This project will consider some subset of these aspects and build a combined probabilistic model that pushes the boundaries along one aspect. The project will explore the model's effects on measurable tasks such as information retrieval, language processing or document compression.

This is an abstract project in the sense that the initial outcome will be a stronger model of a document that can be subsequently used in other tasks. The scope of the project can be varied in that the focus could be extended to include the development of the task itself. The main task we consider is the interpretation or understanding of a large document or small corpus: how can we lay out the key aspects on a small website to aid human understanding.

 

Modelling, Optimisation and Visualisation

Funding Project Title Supervisors

Adaptive Optimisation of Complex Combinatorial Problems

Today, many optimisation problems involve complex combinatorial systems that make traditional approaches unsuitable or intractable. The central aim of this project is to devise techniques which solve complex combinatorial optimisation problems by adapting the optimisation strategy to the problem being solved, based on problem features, such as search space difficulty.

The Adaptive Optimisation approach will provide means of understanding and characterising unknown complex optimisation problems, mainly in Software Engineering, and devise the most suitable optimisation algorithms to solve them. This will facilitate the transfer of optimisation algorithms to industrial settings, where practitioners do not have any knowledge about optimisation.

Optimal network diagram layout using constraint optimisation

Visualizing complex network or graph structures in the form of diagrams has been an important practice in many disciplines for many years. Example application domains include Software Engineering, Biology and Social Network Analysis to name but a few. Automatically laying out these diagrams to make them easy to read and aesthetically pleasing is a difficult problem. Various special-purpose algorithms and heuristics have been developed over the decades to try to obtain such layout and can provide reasonable results in some cases. However, they frequently look bad to a human and often people looking at diagrams can see obvious improvements that the algorithms miss.

With this project, we aim to use modern constraint optimisation techniques, which allow us to decouple the modelling of layout requirements (the aesthetics, specific diagramming conventions for different applications, etc) from the details of optimization. This will allow us to rapidly experiment with different layout designs and to obtain more optimal layout using state-of-the-art solver technologies. The intention is to develop better diagramming tools that enable practitioners to achieve better quality layout and to interact with the layout of their diagrams as never before.

Effective Profiling of Combinatorial Optimisation Problems

Constraint Programming is specifically designed to solve combinatorial optimisation problems, that is, problems that require finding a combination of choices that satisfies a set of constraints and optimises an objective function. This is, for example, the case when looking for new planes, crews and times to replace a delayed flight, or when finding a production schedule in a manufacturing company that reduces waiting while maximising profits. Finding high quality solutions to combinatorial optimisation problems allows us to make the very most of limited resources. This is beneficial for our industry, our hospitals, our security and our environment, and is also a key to wiser investment, better engineering, and accelerated bioinformatics. However, designing programs that can solve optimisation problems effectively requires an iterative process that is often extremely challenging, time consuming and costly, particularly for large-scale problems.

This project will investigate information collected during the execution of a constraint program that can be analysed and summarised in such a way as to help users understand program performance. The results will help users to design scalable, efficient optimisation programs.

Software engineering tools for modelling constraint problems

Constraint programming is a rapidly advancing paradigm that aims to separate the specification of difficult problems from the algorithmic details of finding their solution. A constraint program consists of a declarative description of the problem as an objective function and constraints. The optimization of the objective function subject to the constraints is then carried out by a generic solver. Rapid progress in AI and solver techniques have made constraint programming a valuable tool for transport scheduling, valuing financial instruments, designing efficient energy networks, and many other difficult optimization problems.

The aim of this project is to adapt state-of-the-art software engineering tools (such as Integrated Development Environments, Debugging, Refactoring, etc) that have become essential tools for developers using procedural programming languages, for use with the Constraint Programming paradigm. To do this well, we will need to understand how the people use constraint programming in real-world development, understand what challenges they face and build tools that enhance their workflow.

Visual exploration of graph databases

Traditionally, "big data" was stored in relational database management systems that required the data to be modeled in tabular form. Graph databases are an exciting new storage technology that is at the heart of technologies like Google's "Knowledge Graph" and Facebook's "Graph Search". However, graph databases are also causing the industry to rethink how it models and stores all kinds of rich, heterogeneous, interlinked data.

This paradigm shift is causing people to think about their data in new ways. For example, query languages like SPARQL and Gremlin allow data querying and manipulation in terms of graph traversals instead of table joins. These expert programming languages are evolving quickly, but tools for non-experts are lagging behind. To us, thinking of data in terms of graphs opens up exciting new opportunities for dynamic interfaces to allow people to explore their data visually. This project will explore graph visualization techniques and fluid user interfaces for enabling these scenarios. Graphics, Visualization, Human Computer Interaction, User Experience Design and Layout Algorithms and Techniques will all play a part in this project.

Visualising Execution Profiles of Constraint Programs

Constraint programming is a rapidly advancing paradigm that aims to separate the specification of difficult problems from the algorithmic details of finding their solution. A constraint program consists of a declarative description of the problem as an objective function and constraints. The optimization of the objective function subject to the constraints is then carried out by a generic solver. Rapid progress in AI and solver techniques have made constraint programming a valuable tool for transport scheduling, valuing financial instruments, designing efficient energy networks, and many other difficult optimization problems.

In a perfect world, the programmer should be able to treat the solver as a 'black box'; allowing her to focus her attention on modelling the problem. However, there are times when insight into the internal state of the solver can help the programmer to modify the program to reduce the size of the search space and hence help the solver to more rapidly find a solution. This is particularly important for large-scale problems involving thousands of variables and constraints.

This project aims to provide visualization techniques that will allow us to peer inside the 'black box' and understand how program changes affect the internal state of the solver. We will need to develop sophisticated computer-graphics techniques for visually mapping large tree-like search spaces back to program sources in a compelling interactive way.

Fitness Landscape Characterisation for Combinatorial Optimisation Problems

Many optimisation problems involve complex combinatorial systems that make traditional approaches unsuitable or intractable. Meta-heuristics and local search methods are effective optimisation techniques for such problems, which find near-optimal solutions to complex combinatorial optimisation problems where exact solutions are hard to find. They have been successfully applied to problems such as scheduling, vehicle routing, and task allocation. Despite their empirical effectiveness in a vast variety of problems, we still lack a theoretical understanding of these methods. The question `What makes a problem difficult for an optimisation algorithm?’ is yet to be answered.

This PhD project aims at answering this research question by developing metrics for describing the structure of fitness landscapes and realting this to the effectiveness of particular meta-heuristic methods. A fitness landscape is composed of the solutions, a fitness function, and the search operator, which is part of the optimisation method and connects neighbouring solutions. For the same problem, different optimisation methods create different fitness landscapes. For a given problem, optimisation methods that construct less rugged, and as a result, easily searchable fitness landscapes are more effective for that particular problem. The metrics developed in this project will shed light into the relationship between a problem and an optimisation method, by describing the underlying structure of the fitness landscape created by that optimisation method.

Real-time data acquisition for adaptive crowd modelling

This PhD project is part of a larger collaborative project between the faculties of IT, Engineering, and Science which develops methods for reliable multi-scale modeling of crowd dynamics for disaster prevention.

This project aims to develop methods that support planning and prediction of crowd movements based on data from past events as well as adaptive planning for live events as they unfold. This two-fold approach will facilitate superior risk management in urban design and improved emergency response planning. The key to achieving these aims are multi-scale modelling methods together with high performance simulation and optimisation algorithms specifically designed for these computational models.

The proposed PhD project works at the interface between adaptive optimization methods and real-time data acquisition for these. It will investigate suitable adaptive optimization methods and their data requirements, possible ways to acquire the required real-time crowd movement data via a variety of sources and on different timescales (mobile phone activity, traffic flow data, visual flow data, social media activity, etc), data fusion from such sources, and it will investigate the comparative utility of these data sources for effective and flexible disaster management.

Visualisation of expressive extensions of the Gene Ontology

The Gene Ontology (GO) (Ashburne et al 2000) is one of the most widely used biomedical ontologies. It comprehensively describes different aspects of genes and gene products under three broad categories, (1) biological process, (2) cellular components and (3) molecular functions. GO is currently being actively extended in the LEGO project to create a new annotation system to provide more extensible and expressive annotations.

This project aims to investigating novel visualisation techniques for GO-LEGO and its integration with the Systems Biology Graphical Notation (SBGN). The information can be seen as networks, and although there are some solutions for network layout none is particularly tailered towards this application. Specifically we will investigate how domain knowledge captured in ontologies can assist in creating high-quality layouts.

[1] Ashburner, Michael, et al. "Gene Ontology: tool for the unification of biology."Nature Genetics 25.1 (2000): 25-29. [2] Le Novère et al. “The Systems Biology Graphical Notation” Nature Biotechnology 27 (2009): 735-741.

What makes combinatorial problems hard to optimise?

Choosing the best algorithm to solve an optimisation problem requires knowledge of the problem domain. It is established that there is no algorithm that outperforms all other algorithms in all problem instances.

Using the concept of fitness landscape, it is possible to study the structure of the search space of a particular problem, and analyse features that make problems hard to solve. Insightful conclusions can be drawn from the optimisation process, by exploring the relationship between key characteristics of optimisation problem instances and algorithm behaviour. Seeking an enthusiastic student to research in the area of combinatorial optimisation.

 

Monash NICTA Scholarships

If you are interested in any of these NICTA Projects and Scholarships please apply through Monash Graduate Research online application form and indicate where appropriate on the application that you are interested in one of these Scholarships.

Funding Project Title Supervisors

Post-Graduate Cybersecurity Scholarships (Data61)-Opportunities

Read more

Incremental Power Network Transformation with Uncertainty

Read more

 

Sensilab

Funding Project Title Supervisors

Immersive 3d generative modelling as an educational tool

The project will develop an immersive environment for 3d generative modelling for children.

3d printing is now readily available and has proven to be a very successful tool to engage kids with information technology. This has been recognized by industry and academia and a number of 3d modelling environments for kids, even for primary school age, have appeared.

However, the ones that are conceptually simple are overly simplistic in their modelling approach. Simply said, kids can model in a much better and much more versatile way with Lego than with any virtual environment. One of the major hurdles for children is the conceptual mapping of the on-screen 2d world to the intended 3d object. Another one is to decompose an intended step of 3d manipulation into a sequence of 2d interactions. Surprisingly, none of the modelling environments aimed at children uses immersive 3d visualization and interaction, even though these technologies have now become inexpensive and readily available.

A second, related technology that has proven to be very successful at engaging kids is generative and parametric arts, i.e. the generation of art objects by writing programs that create them (or modify a seed object) rather than by direct manipulation of (virtual) materials. Educationally, the big attraction of this technology is that it bridges from pure use of IT to the world of programming. There is any number of educational tools available for generative arts in 2d, and some of these, in particular Processing, are extremely popular and in wide use. However, almost none are available for 3d modelling. The few that exist, like Grasshopper, are parts of complex professional tool suites for adults.

The project will bring these two areas together by creating an immersive environment for 3d generative modelling for children. This will open new educational opportunities for children and will generate new insights into how children interact with and conceptualise 3d worlds.