Plenary Sessions

Keynotes

Melbourne Data Analytics Platform


Andrew Turpin

BCom, BSc (Hons), PhD

Associate Director, Melbourne Connect.

Director, Melbourne Data Analytics Platform.

Brief Bio

Andrew is a professor in computer science at The University of Melbourne, Australia. His research interests are based around applications of algorithmic efficiency to problems such as data compression, text search, analysis of political texts, human vision testing and modelling of the human vision neural pathways. For the last 2.5 years, he has been charged with establishing the Melbourne Data Analytics Platform; a group of academics who specialize in supporting research of others with cutting edge data analysis and computational techniques. Formation of this workforce if also part of a wider initiative at Melbourne to boost HPC, cloud, and data management infrastructure for research, for which Prof Turpin is the academic lead. Prior to working at Melbourne, he worked at RMIT University (Melbourne), Curtin University (Perth, Australia) and Oregon Health & Science University (Portland, USA). And before embarking on an academic career in computer science, he worked for several years as a trainee actuary with Swiss Re (Australia).

Abstract

In this talk I will discuss the development of a workforce of data and computer scientists that can support researchers at our university to make use of digital technology in their research. This workforce (the Melbourne Data Analytics Platform — MDAP) is unique in Australia, and rare in the world, in that it is comprised of academics whose “KPI” are built around supporting research, not necessarily leading independent research. The talk will discuss the journey in establishing MDAP at Melbourne; the challenges and the successes. Hopefully some of the lessons learned can also translate into other large organisations attempting to establish data analytics workforces.


Construction of Integrated Materials Data System for Data-driven Materials Research

Yibin Xu

National Institute for Materials Science, Japan

Brief Bio

Yibin Xu is currently the Deputy Director of Research and Services Division of Materials Data and Integrated System, and the Group Leader of Data-Driven Inorganic Materials Research Group in National Institute for Materials Science (NIMS). She received her Ph.D. in engineering in 1994 at Shanghai Institute of Ceramics, Chinese Academy of Science, and Ph.D. in information science in 2007 at Nagoya University. She was a STA Fellow (1995-1996) and ITIT Fellow (1996-1997) in National Industrial Research Institute of Nagoya, a system engineer in CTI Co., Ltd. (2000-2002). She has been working in NIMS since 2002, in charge of development of materials databases, meanwhile as a researcher of multi-scale thermal transport properties from single crystal to complex material systems. Her recent research interests include material big data construction, and machine learning aided design and optimization of functional inorganic materials.

Abstract

Big data based artificial intelligence has been greatly expected to change the style of materials research, and improve the efficiency and decrease the cost of materials development. However, in spite of the great efforts done on data collection and database construction since 1880’s, data shortage is still the bottleneck of today’s data-driven materials research.

Data integration has been addressed as one of the key issues of materials data system for decades. In recent years, some data formats and protocols have been proposed for data exchange between different data resources. Nevertheless, material identification is still a confusing problem, since lack of common descriptors for all materials fields. In this presented work, based on the statistics of NIMS Inorganic Materials Database AtomWork-Adv, we proposed a set of descriptors to define a substance, and created a substance dictionary containing more than 158,000 substances. With this dictionary, users can identify their own materials at substance level, and make links to the crystal structure and property data in AtomWork-Adv. We also show an example of data structure for multiphase and composite materials and how to link them with the substance data.

Addressing the data shortage problem, we analyzed the distribution of available property data versus substance. This helps us to understand the current situation of data availability and set up plans of data generation. We also show that a key to small data issue is to leverage the correlations between properties. Since properties dominated by same physical and chemical factors tend to have strong correlationship, it gives us a chance to use the property with sufficient data as a substitute of that with a few data available. Several examples of our data-driven researches with small data set will be introduced.


The Role of the Human Expert in the Era of Big Data

Emille E. O. Ishida

CNRS/Laboratoire de Physique de Clermont (LPC), Université Clermont-Auvergne (UCA), Clermont Ferrand, France

Brief Bio

Emille E. O. Ishida is a Brazilian physicist currently working as a research engineer at CNRS, France. She is co-founder of the Cosmostatistics Initiative (COIN) and the SNAD collaboration and is scientific PI of the Fink broker. She mainly works in machine learning applications to astronomy, with special emphasis on the integration of expert knowledge in the learning cycle (also called adaptive learning techniques). She is also engaged in research for development of interdisciplinary scientific environments able to foster fruitful collaboration inspired by astronomy.

Abstract

The full exploitation of the next generation of large scale photometric surveys depends heavily on our ability to provide a reliable early-epoch classification based solely on photometric data. In preparation for this scenario, there have been many attempts to apply different machine learning algorithms to a series of classification problems in astronomy. Although different methods present different degrees of success, text-book machine learning methods fail to address the crucial issue of lack of representativeness between spectroscopic (training) and photometric (target) samples. In this talk I will show how Active Learning (or optimal experiment design) can be used as a tool for optimizing the construction of spectroscopic samples for classification purposes. I will present results on how the design of spectroscopic samples from the beginning of the survey can achieve optimal classification results with a much lower number of spectra and show how this strategy is being applied to the current ZTF alert stream by the Fink broker. I will also describe how such strategies have proven to be effective also in search for scientifically interesting anomalies within the efforts of the SNAD collaboration.