The Australian Artificial Intelligence Institute is at the forefront of developing the full range of new AI technologies.
AI Technologies
Advanced AI Tech
The Australian Artificial Intelligence Institute has developed many pioneering AI technologies, including:
- autonomous machine learning for data driven decision making;
- deep-learning based intelligent systems for healthcare and public safety;
- recommender systems for customer relationship management;
and much more.
Read about some of AAII's new and emerging AI tech innovations:
RFNN: Robust Fuzzy Neural Network With an Adaptive Inference Engine
Fuzzy neural networks (FNNs) have been very successful at handling uncertainty in data using fuzzy mappings and if-then rules. However, they suffer from generalization and dimensionality issues. Although deep neural networks (DNNs) represent a step toward processing high-dimensional data, their capacity to address data uncertainty is limited. Furthermore, deep learning algorithms designed to improve robustness are either time consuming or yield unsatisfactory performance. In this article, we propose a robust fuzzy neural network (RFNN) to overcome these problems. The network contains an adaptive inference engine that is capable of handling samples with high-level uncertainty and high dimensions. Unlike traditional FNNs that use a fuzzy AND operation to calculate the firing strength for each rule, our inference engine is able to learn the firing strength adaptively. It also further processes the uncertainty in membership function values. Taking advantage of the learning ability of neural networks, the acquired fuzzy sets can be learned from training inputs automatically to cover the input space well. Furthermore, the consequent layer uses neural network structures to enhance the reasoning ability of the fuzzy rules when dealing with complex inputs. Experiments on a range of datasets show that RFNN delivers state-of-the-art accuracy even at very high levels of uncertainty. Our code is available online. https://github.com/leijiezhang/RFNN
AAII investigators: Leijie Zhang, Ye Shi, Yu-Cheng Chang, Chin-Teng Lin
AAII research lab: Computational Intelligence and Brain-Computer Interface Lab (CIBCI Lab)
Research project: Discovery project DP210101093; Discovery project DP220100803
Publication details: IEEE Transactions on Cybernetics (DOI: 10.1109/TCYB.2023.3241170)
SSD: Multidomain Adaptation With Sample and Source Distillation
Unsupervised multidomain adaptation attracts increasing attention as it delivers richer information when tackling a target task from an unlabeled target domain by leveraging the knowledge attained from labeled source domains. However, it is the quality of training samples, not just the quantity, that influences transfer performance. In this article, we propose a multidomain adaptation method with sample and source distillation (SSD), which develops a two-step selective strategy to distill source samples and define the importance of source domains. To distill samples, the pseudo-labeled target domain is constructed to learn a series of category classifiers to identify transfer and inefficient source samples. To rank domains, the agreements of accepting a target sample as the insider of source domains are estimated by constructing a domain discriminator based on selected transfer source samples. Furthermore, an enhancement mechanism is built by matching selected pseudo-labeled and unlabeled target samples. The degrees of acceptance learned by the domain discriminator are finally employed as source merging weights to predict the target task. Superiority of the proposed SSD is validated on real-world visual classification tasks
AAII investigators: Keqiuyin Li, Hua Zuo, Jie Lu, Guangquan Zhang
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Research project: Laureate project FL190100149
Publication details: IEEE Transactions on Cybernetics (DOI: 10.1109/TCYB.2023.3236008)
DCA: Dynamic Classifier Alignment for Unsupervised Multi-Source Domain Adaptation
Unsupervised domain adaptation leverages the previously gained knowledge from a labeled source domain to tackle the task from a different but similar unlabeled target domain. Most existing methods focus on transferring knowledge from a single source domain, but the information from a single domain may be inadequate to complete the target task. Some previous studies have turned to multi-view representations to enrich the transferable information. However, they simply concatenate multi-view features, which may result in information redundancy. In this paper, we propose a dynamic classifier alignment (DCA) method for multi-source domain adaptation, which aligns classifiers driven from multi-view features via a sample-wise automatic way. As proposed, both the importance of each view and the contribution of each source domain are investigated. To determine the important degrees of multiple views, an importance learning function is built by generating an auxiliary classifier. To learn the source combination parameters, a domain discriminator is developed to estimate the probability of a sample belonging to multiple source domains. Meanwhile, a self-training strategy is proposed to enhance the cross-domain ability of source classifiers with the assistance of pseudo target labels. Experiments on real-world visual datasets show the superiority of the proposed DCA.
AAII investigators: Keqiuyin Li, Hua Zuo, Jie Lu AO, Guangquan Zhang
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Research project: Laureate project FL190100149
Publication details: IEEE Transactions on Knowledge and Data Engineering (DOI: 10.1109/TKDE.2022.3144423)
MSCLDA: Multi-source contribution learning for domain adaptation
Transfer learning becomes an attractive technology to tackle a task from a target domain by leveraging previously acquired knowledge from a similar domain (source domain). Different source domains contain different transferable information. Hence, the source contribution should be taken into account when predicting a target task. In this article, we propose a novel multi-source contribution learning method for domain adaptation (MSCLDA). As proposed, the similarities and diversities of domains are learned simultaneously by extracting multi-view features. Then multi-level distribution matching is employed to improve the transferability of latent features, aiming to reduce misclassification of boundary samples by maximizing discrepancy between different classes and minimizing discrepancy between the same classes. Concurrently, instead of averaging source predictions or weighting sources using normalized similarities, the original weights learned by normalizing similarities are adjusted using pseudo target labels to increase the disparities of weight values, which is desired to improve the performance of the final target predictor if the predictions of sources exist significant difference. Experiments on real-world visual data sets demonstrate the superiorities of our proposed method.
AAII investigators: Keqiuyin Li, Hua Zuo, Jie Lu AO, Guangquan Zhang
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Research project: Laureate project FL190100149
Publication details: IEEE Transactions on Neural Networks and Learning Systems (DOI: 10.1109/TNNLS.2021.3069982)
SAGN: Multi-stream concept drift self-adaptation using graph neural network
Concept drift is the phenomenon where the data distribution in a data stream changes over time. It is a ubiquitous problem in the real-world, for example, a traffic accident would cause a jam in a certain period, leading to a distribution change in traffic speed. Most research in the concept drift field focuses on single data stream, however, few of them consider multi-stream environments which are more in line with the application needs. To fill this gap, we propose a multi-stream prediction setting and a multi-stream concept drift self-adaptation framework using graph neural network, named SAGN. In SAGN, we reconsider the learning procedure of GNN-based predictors from an aspect of concept drift adaptation for multi-stream. By this design, the prediction task is converted into online streaming data tasks in sub-graphs. Each sub-graph corresponds to an adaptation target and will be updated over time. In this way, locally we can overcome drift in each sub-graph by a designed adaptation technique, and globally the correlation between different data streams is well-preserved as a graph structure. Therefore, whether drift occurs or not, in one or several streams, SAGN can provide consistently accurate prediction results. We comprehensively tested SAGN on both synthetic and real-world, drift and non-drift data in the multi-step prediction task. The experiment results show that SAGN is able to achieve state-of-the-art performance in most cases.
AAII investigators: Ming Zhou, Yiliao Song, Jie Lu AO, Guangquan Zhang
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Research project: Laureate project FL190100149; Discovery project DP200100700
Publication details: IEEE Transactions on Knowledge and Data Engineering (TKDE) - forthcoming
MCIMO: Multiclass Classification With Fuzzy-Feature Observations: Theory and Algorithms
The theoretical analysis of multiclass classification has proved that the existing multiclass classification methods can train a classifier with high classification accuracy on the test set, when the instances are precise in the training and test sets with same distribution and enough instances can be collected in the training set. However, one limitation with multiclass classification has not been solved: how to improve the classification accuracy of multiclass classification problems when only imprecise observations are available. Hence, in this article, we propose a novel framework to address a new realistic problem called multiclass classification with imprecise observations (MCIMO), where we need to train a classifier with fuzzy-feature observations. First, we give the theoretical analysis of the MCIMO problem based on fuzzy Rademacher complexity. Then, two practical algorithms based on support vector machine and neural networks are constructed to solve the proposed new problem. The experiments on both synthetic and real-world datasets verify the rationality of our theoretical analysis and the efficacy of the proposed algorithms.
AAII investigators: Guangzhi Ma, Jie Lu, Feng Liu, Zhen Fang, Guangquan Zhang
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Research project: Laureate project FL190100149
Publication details: IEEE Transactions on Cybernetics (DOI: 10.1109/TCYB.2022.3181193)
LIR-eGB: Evolving Gradient Boost - A Pruning Scheme Based on Loss Improvement Ratio for Learning under Concept Drift
In non-stationary environments, data distributions can change over time. This phenomenon is known as concept drift, and the related models need to adapt if they are to remain accurate. With gradient boosting ensemble models, selecting which weak learners to keep/prune to maintain model accuracy under concept drift is non-trivial research. Unlike existing models such as AdaBoost, which can directly compare weak learners’ performance by their accuracy (a metric between [0, 1]), in gradient boosting, weak learners’ performance is measured with different scales. To address the performance measurement scaling issue, we propose a novel criterion to evaluate weak learners in gradient boosting models, called loss improvement ratio (LIR). Based on LIR, we develop two pruning strategies: (1) naive pruning (NP) which simply deletes all learners with increasing loss; (2) statistical pruning (SP) which removes learners if their loss increase meets a significance threshold. We also devise a scheme to dynamically switch between NP and SP to achieve the best performance. We implement the scheme as a concept drift learning algorithm, called evolving gradient boost (LIR-eGB). On average, LIR-eGB delivered the best performance against state-of-the-art methods on both stationary and non-stationary data.
AAII investigators: Kun Wang, Jie Lu, Anjin Liu, Guangquan Zhang, Li Xiong
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Research project: Discovery project DP190101733.
Publication details: IEEE Transactions on Cybernetics (DOI: 10.1109/TCYB.2021.3109796)
SummAttacker: Improving the Robustness of Summarization Systems with Dual Augmentation
A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input. In this work, we first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise. To create semantic-consistent substitutes, we propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models. Experimental results show that state-of-the-art summarization models have a significant decrease in performance on adversarial and noisy test sets. Next, we analyze the vulnerability of the summarization systems and explore improving the robustness by data augmentation. Specifically, the first brittleness factor we found is the poor understanding of infrequent words in the input. Correspondingly, we feed the encoder with more diverse cases created by SummAttacker in the input space. The other factor is in the latent space, where the attacked inputs bring more variations to the hidden states. Hence, we construct adversarial decoder input and devise manifold softmixing operation in hidden space to introduce more diversity. Experimental results on Gigaword and CNN/DM datasets demonstrate that our approach achieves significant improvements over strong baselines and exhibits higher robustness on noisy, attacked, and clean datasets.
AAII investigators: Guodong Long, Chengqi Zhang
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: The 61st Annual Meeting of the Association for Computational Linguistics - main conference (ACL) 2023 (DOI: https://doi.org/10.48550/arXiv.2306.01090
CoTASP: Continual Task Allocation in Meta-Policy Network via Sparse Prompting
How to train a generalizable meta-policy by continually learning a sequence of tasks? It is a natural human skill yet challenging to achieve by current reinforcement learning: the agent is expected to quickly adapt to new tasks (plasticity) meanwhile retaining the common knowledge from previous tasks (stability). We address it by “Continual Task Allocation via Sparse Prompting (CoTASP)”, which learns over-complete dictionaries to produce sparse masks as prompts extracting a sub-network for each task from a meta-policy network. By optimizing the sub-network and prompts alternatively, CoTASP updates the metapolicy via training a task-specific policy. The dictionary is then updated to align the optimized prompts with tasks’ embedding, thereby capturing their semantic correlations. Hence, relevant tasks share more neurons in the meta-policy network via similar prompts while cross-task interference causing forgetting is effectively restrained. Given a trained meta-policy with updated dictionaries, new task adaptation reduces to highly efficient sparse prompting and sub-network finetuning. In experiments, CoTASP achieves a promising plasticity-stability trade-off without storing or replaying any past tasks’ experiences and outperforms existing continual and multi-task RL methods on all seen tasks, forgetting reduction, and generalization to unseen tasks. Code is available at https://github.com/stevenyangyj/CoTASP
AAII investigators: Yijun Yang, Jing Jiang, Guodong Long
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: The 40th International Conference on Machine Learning (ICML) 2023 https://arxiv.org/abs/2305.18444
FPF and k-FPF: Does Continual Learning Equally Forget All Parameters?
Distribution shift (e.g. task or domain shift) in continual learning (CL) usually results in catastrophic forgetting of neural networks. Although it can be alleviated by repeatedly replaying buffered data, the every-step replay is time-consuming. In this paper, we study which modules in neural networks are more prone to forgetting by investigating their training dynamics during CL. Our proposed metrics show that only a few modules are more task-specific and sensitively alter between tasks, while others can be shared across tasks as common knowledge. Hence, we attribute forgetting mainly to the former and find that finetuning them only on a small buffer at the end of any CL method can bring non-trivial improvement. Due to the small number of finetuned parameters, such ``Forgetting Prioritized Finetuning (FPF)'' is efficient in computation. We further propose a more efficient and simpler method that entirely removes the every-step replay and replaces them by only k-times of FPF periodically triggered during CL. Surprisingly, this ``k-FPF'' performs comparably to FPF and outperforms the SOTA CL methods but significantly reduces their computational overhead and cost. In experiments on several benchmarks of class- and domain-incremental CL, FPF consistently improves existing CL methods by a large margin, and k-FPF further excels in efficiency without degrading the accuracy. We also empirically studied the impact of buffer size, epochs per task, and finetuning modules on the cost and accuracy of our methods.
AAII investigators: Haiyan Zhao, Guodong Long, Jing Jiang, Chengqi Zhang
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: The 40th International Conference on Machine Learning (ICML) 2023 (DOI: https://doi.org/10.48550/arXiv.2304.04158)
PFedRec: Dual Personalization on Federated Recommendation
Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings. Existing solutions are used to combine distributed recommendation algorithms and privacy-preserving mechanisms. Thus it inherently takes the form of heavyweight models at the server and hinders the deployment of on-device intelligent models to end-users. This paper proposes a novel Personalized Federated Recommendation (PFedRec) framework to learn many user-specific lightweight models to be deployed on smart devices rather than a heavyweight model on a server. Moreover, we propose a new dual personalization mechanism to effectively learn fine-grained personalization on both users and items. The overall learning process is formulated into a unified federated optimization framework. Specifically, unlike previous methods that share exactly the same item embeddings across users in a federated system, dual personalization allows mild finetuning of item embeddings for each user to generate user-specific views for item representations which can be integrated into existing federated recommendation methods to gain improvements immediately. Experiments on multiple benchmark datasets have demonstrated the effectiveness of PFedRec and the dual personalization mechanism. Moreover, we provide visualizations and in-depth analysis of the personalization techniques in item embedding, which shed novel insights on the design of recommender systems in federated settings. The code is available: https://github.com/Zhangcx19/IJCAI-23-PFedRec
AAII investigators: Guodong Long, Peng Yan, Chengqi Zhang
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: The 32nd International Joint Conference on Artificial Intelligence (IJCAI) 2023 (DOI:
https://doi.org/10.48550/arXiv.2301.08143)
Prompt Federated Learning for Weather Forecasting: Toward Foundation Models on Meteorological Data
To tackle the global climate challenge, a collaborative platform for comprehensive weather forecasting on large-scale meteorological data is urgently needed. The main barrier are heterogeneous meteorological sensors across countries and regions, inevitably causing multivariate heterogeneity and data exposure. This paper develops a foundation model across regions capable of understanding complex meteorological data and providing weather forecasting. To relieve the data exposure concern across regions, a novel federated learning approach has been proposed to collaboratively learn a brand new spatio-temporal Transformer-based foundation model across participants with heterogeneous meteorological data. Moreover, a novel prompt learning mechanism has been adopted to satisfy low resourced sensors’ communication and computational constraints. The effectiveness of the proposed method has been demonstrated on classical weather forecasting tasks using three meteorological datasets with multivariate time series.
AAII investigators: Shengchao Chen, Guodong Long, Tao Shen, Chengqi Zhang
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: The 32nd International Joint Conference on Artificial Intelligence (IJCAI) 2023 (DOI: https://doi.org/10.48550/arXiv.2301.09152)
UnifieR: A Unified Retriever for Large-Scale Retrieval
Large-scale retrieval is to recall relevant documents from a huge collection given a query. It relies on representation learning to embed documents and queries into a common semantic encoding space. According to the encoding space, recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. These two paradigms unveil the PLMs' representation capability in different granularities, i.e., global sequence-level compression and local word-level contexts, respectively. Inspired by their complementary global-local contextualization and distinct representing views, we propose a new learning framework, UnifieR, which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability. Experiments on passage retrieval benchmarks verify its effectiveness in both paradigms. A uni-retrieval scheme is further presented with even better retrieval quality. We lastly evaluate the model on BEIR benchmark to verify its transferability.
AAII investigators: Tao Shen, Guodong Long
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: The 29th ACM SIG Knowledge Discovery and Data Mining (KDD) 2023 (DOI: https://doi.org/10.48550/arXiv.2205.11194
CHBias: Bias Evaluation and Mitigation of Chinese Conversational Language Models
Pretrained conversational agents have been exposed to safety issues, exhibiting a range of stereotypical human biases such as gender bias. However, there are still limited bias categories in current research, and most of them only focus on English. In this paper, we introduce a new Chinese dataset, CHBias, for bias evaluation and mitigation of Chinese conversational language models. Apart from those previous well-explored bias categories, CHBias includes under-explored bias categories, such as ageism and appearance biases, which received less attention. We evaluate two popular pretrained Chinese conversational models, CDial-GPT and EVA2.0, using CHBias. Furthermore, to mitigate different biases, we apply several debiasing methods to the Chinese pretrained models. Experimental results show that these Chinese pretrained models are potentially risky for generating texts that contain social biases, and debiasing methods using the proposed dataset can make response generation less biased while preserving the models’ conversational capabilities.
AAII investigators: Zijing Shi, Ling Chen
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: The Annual Meeting of the Association for Computational Linguistics (ACL) 2023 (DOI: https://doi.org/10.48550/arXiv.2305.11262)
MorAL: Stay Moral and Explore - Learn to Behave Morally in Text-based Games
Reinforcement learning (RL) in text-based games has developed rapidly and achieved promising results. However, little effort has been expended to design agents that pursue objectives while behaving morally, which is a critical issue in the field of autonomous agents. In this paper, we propose a general algorithm named Moral Awareness Adaptive Learning (MorAL) that enhances the morality capacity of an agent using a plugin moral-aware learning model. The algorithm allows the agent to execute task learning and morality learning adaptively. The agent selects trajectories from past experiences during task learning. Meanwhile, the trajectories are used to conduct self-imitation learning with a moral-enhanced objective. In order to achieve the trade-off between morality and task progress, the agent uses the combination of task policy and moral policy for action selection. We evaluate on the Jiminy Cricket benchmark, a set of text-based games with various scenes and dense morality annotations. Our experiments demonstrate that, compared with strong contemporary value alignment approaches, the proposed algorithm improves task performance while reducing immoral behaviours in various games.
AAII investigators: Zijing Shi, Ling Chen
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: The International Conference on Learning Representations (ICLR) 2023 (https://openreview.net/pdf?id=CtS2Rs_aYk)
Robust H∞ Pinning Synchronization for Multiweighted Coupled Reaction–Diffusion Neural Networks
This article focuses on the robust H∞ synchronization of two types of coupled reaction–diffusion neural networks with multiple state and spatial diffusion couplings by utilizing pinning adaptive control strategies. First, based on the Lyapunov functional combined with inequality techniques, several sufficient conditions are formulated to ensure H∞ synchronization for these two networks with parameter uncertainties. Moreover, node-based pinning adaptive control strategies are devised to address the robust H∞ synchronization problem. In addition, some criteria of H∞ synchronization for these two networks under parameter uncertainties are developed via edge-based pinning adaptive controllers. Finally, two numerical examples are presented to verify our results.
AAII investigators: Lin-Hao Zhao, Shiping Wen
AAII research lab: Intelligent Computing and Systems (ICS) Lab
Publication details: IEEE Transactions on Cybernetics (2022) (DOI: 10.1109/TCYB.2022.3223713)
Suboptimal Leader-to-Coordination Control for Nonlinear Systems With Switching Topologies: A Learning-Based Method
In the cooperative control for multiagent systems (MASs), the key issues of distributed interaction, nonlinear characteristics, and optimization should be considered simultaneously, which, however, remain intractable theoretically even to this day. Considering these factors, this article investigates leader-to-formation control and optimization for nonlinear MASs using a learning-based method. Under time-varying switching topology, a fully distributed state observer based on neural networks is designed to reconstruct the dynamics and the state trajectory of the leader signal with arbitrary precision under jointly connected topology assumption. Benefitted from the observers, formation for MASs under switching topologies is transformed into tracking control for each subsystem with continuous state generated by the observers. An augmented system with discounted infinite LQR performance index is considered to optimize the control effect. Due to the complexity of solving the Hamilton-Jacobi-Bellman equation, the optimal value function is approximated by a critic network via the integral reinforcement learning method without the knowledge of drift dynamics. Meanwhile, an actor network is also presented to assure stability. The tracking errors and estimation weighted matrices are proven to be uniformly ultimately bounded. Finally, two illustrative examples are given to show the effectiveness of this method.
AAII investigator: Shiping Wen
AAII research lab: Intelligent Computing and Systems (ICS) Lab
Publication details: IEEE Transactions on Neural Networks and Learning Systems (2023) (DOI: 10.1109/TNNLS.2022.3169417)
Approximate Optimal Control for Nonlinear Systems with Periodic Event-Triggered Mechanism
This article investigates the approximate optimal control problem for nonlinear affine systems under the periodic event triggered control (PETC) strategy. In terms of optimal control, a theoretical comparison of continuous control, traditional event-based control (ETC), and PETC from the perspective of stability convergence, concluding that PETC does not significantly affect the convergence rate than ETC. It is the first time to present PETC for optimal control target of nonlinear systems. A critic network is introduced to approximate the optimal value function based on the idea of reinforcement learning (RL). It is proven that the discrete updating time series from PETC can also be utilized to determine the updating time of the learning network. In this way, the gradient-based weight estimation for continuous systems is developed in discrete form. Then, the uniformly ultimately bounded (UUB) condition of controlled systems is analyzed to ensure the stability of the designed method. Finally, two illustrative examples are given to show the effectiveness of the method.
AAII investigator: Shiping Wen
AAII research lab: Intelligent Computing and Systems (ICS) Lab
Publication details: IEEE Transactions on Neural Networks and Learning Systems (2023) (DOI: 10.1109/TNNLS.2021.3107550)
AutoGMap: Learning to Map Large-Scale Sparse Graphs on Memristive Crossbars
The sparse representation of graphs has shown great potential for accelerating the computation of graph applications (e.g., social networks and knowledge graphs) on traditional computing architectures (CPU, GPU, or TPU). But, the exploration of large-scale sparse graph computing on processing-in-memory (PIM) platforms (typically with memristive crossbars) is still in its infancy. To implement the computation or storage of large-scale or batch graphs on memristive crossbars, a natural assumption is that a large-scale crossbar is demanded, but with low utilization. Some recent works question this assumption; to avoid the waste of storage and computational resource, the fixed-size or progressively scheduled “block partition” schemes are proposed. However, these methods are coarse-grained or static and are not effectively sparsity-aware. This work proposes the dynamic sparsity-aware mapping scheme generating method that models the problem with a sequential decision-making model, and optimizes it by reinforcement learning (RL) algorithm (REINFORCE). Our generating model long short-term memory (LSTM), combined with the dynamic-fill scheme generates remarkable mapping performance on the small-scale graph/matrix data (complete mapping costs 43% area of the original matrix) and two large-scale matrix data (costing 22.5% area on qh882 and 17.1% area on qh1484). Our method may be extended to sparse graph computing on other PIM architectures, not limited to the memristive device-based platforms.
AAII investigator: Shiping Wen
AAII research lab: Intelligent Computing and Systems (ICS) Lab
Publication details: IEEE Transactions on Neural Networks and Learning Systems (2023) (DOI: 10.1109/TNNLS.2023.3265383)
Designing Efficient Bit-Level Sparsity-Tolerant Memristive Networks
hhhWith the rapid progress of deep neural network (DNN) applications on memristive platforms, there has been a growing interest in the acceleration and compression of memristive networks. As an emerging model optimization technique for memristive platforms, bit-level sparsity training (with the fixed-point quantization) can significantly reduce the demand for analog-to-digital converters (ADCs) resolution, which is critical for energy and area consumption. However, the bit sparsity and the fixed-point quantization will inevitably lead to a large performance loss. Different from the existing training and optimization techniques, this work attempts to explore more sparsity-tolerant architectures to compensate for performance degradation. We first empirically demonstrate that in a certain search space (e.g., 4-bit quantized DARTS space), network architectures differ in bit-level sparsity tolerance. It is reasonable and necessary to search the architectures for efficient deployment on memristive platforms by the neural architecture search (NAS) technology. We further introduce bit-level sparsity-tolerant NAS (BST-NAS), which encapsulates low-precision quantization and bit-level sparsity training into the differentiable NAS, to explore the optimal bit-level sparsity-tolerant architectures. Experimentally, with the same degree of sparsity and experiment settings, our searched architectures obtain a promising performance, which outperform the normal NAS-based DARTS-series architectures (about 5.8% higher than that of DARTS-V2 and 2.7% higher than that of PC-DARTS) on CIFAR10.
AAII investigator: Shiping Wen
AAII research lab: Intelligent Computing and Systems (ICS) Lab
Publication details: IEEE Transactions on Neural Networks and Learning Systems (2023) (DOI: 10.1109/TNNLS.2023.3250437)
Robust Gaussian Process Regression With Input Uncertainty: A PAC-Bayes Perspective
The Gaussian process (GP) algorithm is considered as a powerful nonparametric-learning approach, which can provide uncertainty measurements on the predictions. The standard GP requires clearly observed data, unexpected perturbations in the input may lead to learned regression model mismatching. Besides, GP also suffers from the lack of good generalization performance guarantees. To deal with data uncertainty and provide a numerical generalization performance guarantee on the unknown data distribution, this article proposes a novel robust noisy input GP (NIGP) algorithm based on the probably approximately correct (PAC) Bayes theory. Furthermore, to reduce the computational complexity, we develop a sparse NIGP algorithm, and then develop a sparse PAC-Bayes NIGP approach. Compared with NIGP algorithms, instead of maximizing the marginal log likelihood, one can optimize the PAC-Bayes bound to pursue a tighter generalization error upper bound. Experiments verify that the NIGP algorithms can attain greater accuracy. Besides, the PAC-NIGP algorithms proposed herein can achieve both robust performance and improved generalization error upper bound in the face of both uncertain input and output data.
AAII investigators: Tianyu Liu, Jie Lu, Yan Zheng and Guangquan Zhang
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Research project: Discovery project DP200100700; International Research Training Program Scholarship (of Australia
Publication details: IEEE Transactions on Cybernetics 2023 (DOI: 10.1109/TCYB.2022.3191022)
Disentangling Stochastic PDE Dynamics for Unsupervised Video Prediction
Unsupervised video prediction aims to predict future outcomes based on the observed video frames, thus removing the need for supervisory annotations. This research task has been argued as a key component of intelligent decision-making systems, as it presents the potential capacities of modeling the underlying patterns of videos. Essentially, the challenge of video prediction is to effectively model the complex spatiotemporal and often uncertain dynamics of high-dimensional video data. In this context, an appealing way of modeling spatiotemporal dynamics is to explore prior physical knowledge, such as partial differential equations (PDEs). In this article, considering real-world video data as a partly observed stochastic environment, we introduce a new stochastic PDE predictor (SPDE- predictor ), which models the spatiotemporal dynamics by approximating a generalized form of PDEs while dealing with the stochasticity. A second contribution is that we disentangle the high-dimensional video prediction into low-level dimensional factors of variations: time-varying stochastic PDE dynamics and time-invariant content factors. Extensive experiments on four various video datasets show that SPDE video prediction model (SPDE-VP) outperforms both deterministic and stochastic state-of-the-art methods. Ablation studies highlight our superiority driven by both PDE dynamics modeling and disentangled representation learning and their relevance in long-term video prediction.
AAII investigators: Xinheng Wu, Jie Lu, Zheng Yan and Guangquan Zhang
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Research project: Discovery project DP200100700
Publication details: IEEE Transactions on Neural Networks and Learning Systems 2023 (DOI: 10.1109/TNNLS.2023.3286890)/
An Extremely Simple Algorithm for Source Domain Reconstruction
The aim of unsupervised domain adaptation (UDA) is to utilize knowledge from a source domain to enhance the performance of a given target domain. Due to the lack of accessibility to the target domain’s labels, UDA’s efficacy is highly reliant on the source domain’s quality. However, it is often impractical and expensive to obtain an appropriate transferable source domain. To address this issue, we propose a novel UDA setting, source domain reconstruction (SDR), which seeks to construct a new transferable source domain utilizing labeled source samples and unlabeled target samples. SDR has a significant advantage over the conventional method as it is much less expensive to construct a suitable pseudo-source domain rather than collecting an actual transferable source domain in real-world scenarios. To test the practice of SDR, we investigate SDR theoretically. We propose an easily implementable algorithm, the domain MixUp (DMU), which is motivated by the MixUp strategy, to solve the SDR problem. The algorithm can be used to design a UDA framework to significantly enhance the performance of several existing UDA algorithms. Results from extensive experiments conducted on seven benchmarks (66 UDA tasks) indicate that the reconstructed source domain has stronger transferability than the original source domain.
AAII investigators: Zhen Fang, Jie Lu, Zheng Yan and Guangquan Zhang
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Research project: Laureate project FL190100149
Publication details: IEEE Transactions on Cybernetics 2023 (DOI: 10.1109/TCYB.2023.3298830)
Is Out-of-Distribution Detection Learnable?
Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good generalization ability is crucial for effective OOD detection algorithms. To study the generalization of OOD detection, in this paper, we investigate the probably approximately correct (PAC) learning theory of OOD detection, which is proposed by researchers as an open problem. First, we find a necessary condition for the learnability of OOD detection. Then, using this condition, we prove several impossibility theorems for the learnability of OOD detection under some scenarios. Although the impossibility theorems are frustrating, we find that some conditions of these impossibility theorems may not hold in some practical scenarios. Based on this observation, we next give several necessary and sufficient conditions to characterize the learnability of OOD detection in some practical scenarios. Lastly, we also offer theoretical supports for several representative OOD detection works based on our OOD theory.
AAII investigators: Zhen Fang, Jie Lu
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Research project: Laureate project FL190100149
Publication details: Advances in Neural Information Processing Systems (NeurIPS 2022)
Attention-Bridging TS Fuzzy Rules for Universal Multi-Domain Adaptation without Source Data
Universal source-free domain adaptation is an urgent, unsolved problem in transfer learning. Existing source-free and universal domain adaptation methods ignore soft information and data uncertainty during transfer resulting from limited precise target labels and the data shift between the source and target domains, which can lead to negative transfer. To solve this problem, we propose a deep attention-bridging model based on Takagi-Sugeno fuzzy rules for universal domain adaptation. The model is designed to cope with some of the most challenging situations in transfer learning where there are multiple source domains but access to the source data is not available. Hence, to build fuzzy rules in source domain(s), similar source classes are grouped into clusters which share a rule. Then, through an extension of fuzzy c-means clustering to deep structures, the model learns the fuzzy cluster prototypes and their memberships. To predict target task, thresholds for dividing the shared class from the unknown classes are defined through entropy assumption and fuzzy outputs. Here, the soft class centers generated from the model parameters, which are based on a set of selected shared classes, are designed to match the target data to the source feature space. Last, a combination strategy involving an attention layer provides the final predictions. Experiments with real-world datasets confirm the strength of the proposed fuzzy model.
AAII investigators: Keqiuyin Li, Hua Zuo, Jie Lu, Guangquan Zhang
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Research project: Laureate project FL190100149
Publication details: FUZZ-IEEE 2023 (Best Paper)
Meta OOD Learning for Continuously Adaptive OOD Detection
Out-of-distribution (OOD) detection is crucial to modern deep learning applications by identifying and alerting about the OOD samples that should not be tested or used for making predictions. Current OOD detection methods have made significant progress when in-distribution (ID) and OOD samples are drawn from static distributions. However, this can be unrealistic when applied to real-world systems which often undergo continuous variations and shifts in ID and OOD distributions over time. Therefore, for an effective application in real-world systems, the development of OOD detection methods that can adapt to these dynamic and evolving distributions is essential. In this paper, we propose a novel and more realistic setting called continuously adaptive out-of-distribution (CAOOD) detection which targets on developing an OOD detection model that enables dynamic and quick adaptation to a new arriving distribution, with insufficient ID samples during deployment time. To address CAOOD, we develop meta OOD learning (MOL) by designing a learning-to-adapt diagram such that a good initialized OOD detection model is learned during the training process. In the testing process, MOL ensures OOD detection performance over shifting distributions by quickly adapting to new distributions with a few adaptations. Extensive experiments on several OOD benchmarks endorse the effectiveness of our method in preserving both ID classification accuracy and OOD detection performance on continuously shifting distributions.
AAII investigators: Xinheng Wu, Jie Lu, Zhen Fang, Guangquan Zhang
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Research project: Discovery project DP200100700
Publication details: 2023 International Conference on Computer Vision (ICCV)
Source-Free Multi-Domain Adaptation with Fuzzy Rule-based Deep Neural Networks
Unsupervised domain adaptation deals with a task from an unlabeled target domain by leveraging the knowledge gained from labeled source domain(s). Fuzzy system is adopted in domain adaptation to better tackle the uncertainty caused by information scarcity in the transfer. But existing domain adaptation methods rarely deal with any soft information component due to data imprecision. Besides, fewer methods handle multiple source domains which provide richer transfer information. Thus, in this paper, we propose source-free multi-domain adaptation with fuzzy rule-based deep neural networks (SF-FDN), which takes advantage of a fuzzy system to handle data uncertainty in domain adaptation without source data. To learn source private models with high generality, which is important to collect low noisy pseudo target labels, auxiliary tasks are designed by jointly training source models from multiple domains which share source parameters and fuzzy rules while protecting source data. To transfer fuzzy rules and fit source private parameters to the target domain, self-supervised learning and anchor-based alignment are built to force target data to source feature spaces. Experiments on real-world datasets under both homogeneous and heterogeneous label space scenarios are carried out to validate the proposed method. The results indicate the superiority of the proposed fuzzy rule-based source-free multi-domain adaptation method.
AAII investigators: Keqiuyin Li, Hua Zuo, Jie Lu, Guangquan Zhang
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Research project: Laureate project FL190100149
Publication details: IEEE Transactions on Fuzzy Systems 2023 (DOI: 10.1109/TFUZZ.2023.3276978)
Bibliometric analysis of parasite vaccine research from 1990 to 2019
Bibliometric and bibliographic analyses are popular tools for investigating publication metrics and thematic transitions in an expanding codex of biomedical literature. Bibliometric techniques have been employed in parasitology and vaccinology, with only a few malaria-specific literature analyses being reported specifically on parasite vaccines. The pursuit of parasite prophylactics is an important, global endeavour both medically and economically. As such, a comprehensive understanding of the research topics would be a valuable tool in assessing the current status and future directions of parasite vaccine development. Consequently, this study investigated parasite vaccinology from 1990 to 2019 by analysing literature exported from the Web of Science and Dimensions databases using two, commonly used, bibliometric programs: SciMAT and VOSviewer. The results of this study show the common, emerging, and transient themes within the discipline, and where the future lies as vaccine development moves further into the age of omics and informatics.
AAII investigator: Paul Kennedy
AAII research lab: Biomedical Data Science Lab
Publication details: Vaccine 2023 (DOI:https://doi.org/10.1016/j.vaccine.2023.09.035)
Machine learning for administrative health records: A systematic review of techniques and applications
Machine learning provides many powerful and effective techniques for analysing heterogeneous electronic health records (EHR). Administrative Health Records (AHR) are a subset of EHR collected for administrative purposes, and the use of machine learning on AHRs is a growing subfield of EHR analytics. Existing reviews of EHR analytics emphasise that the data-modality of the EHR limits the breadth of suitable machine learning techniques, and pursuable healthcare applications. Despite emphasising the importance of data modality, the literature fails to analyse which techniques and applications are relevant to AHRs. AHRs contain uniquely well-structured, categorically encoded records which are distinct from other data-modalities captured by EHRs, and they can provide valuable information pertaining to how patients interact with the healthcare system.
This paper systematically reviews AHR-based research, analysing 70 relevant studies and spanning multiple databases. We identify and analyse which machine learning techniques are applied to AHRs and which health informatics applications are pursued in AHR-based research. We also analyse how these techniques are applied in pursuit of each application, and identify the limitations of these approaches. We find that while AHR-based studies are disconnected from each other, the use of AHRs in health informatics research is substantial and accelerating. Our synthesis of these studies highlights the utility of AHRs for pursuing increasingly complex and diverse research objectives despite a number of pervading data- and technique-based limitations. Finally, through our findings, we propose a set of future research directions that can enhance the utility of AHR data and machine learning techniques for health informatics research.
AAII investigator: Paul Kennedy
AAII research lab: Biomedical Data Science Lab
Publications details: Artificial Intelligence In Medicine 2023 (DOI: https://doi.org/10.1016/j.artmed.2023.102642)
A state-of-the-art methodology for high-throughput in silico vaccine discovery against protozoan parasites and exemplified with discovered candidates for Toxoplasma gondii
Vaccine discovery against eukaryotic parasites is not trivial as highlighted by the limited number of known vaccines compared to the number of protozoal diseases that need one. Only three of 17 priority diseases have commercial vaccines. Live and attenuated vaccines have proved to be more effective than subunit vaccines but adversely pose more unacceptable risks. One promising approach for subunit vaccines is in silico vaccine discovery, which predicts protein vaccine candidates given thousands of target organism protein sequences. This approach, nonetheless, is an overarching concept with no standardised guidebook on implementation. No known subunit vaccines against protozoan parasites exist as a result of this approach, and consequently none to emulate. The study goal was to combine current in silico discovery knowledge specific to protozoan parasites and develop a workflow representing a state-of-the-art approach. This approach reflectively integrates a parasite’s biology, a host's immune system defences, and importantly, bioinformatics programs needed to predict vaccine candidates. To demonstrate the workflow effectiveness, every Toxoplasma gondii protein was ranked in its capacity to provide long-term protective immunity. Although testing in animal models is required to validate these predictions, most of the top ranked candidates are supported by publications reinforcing our confidence in the approach.
AAII investigator: Paul Kennedy
AAII research lab: Biomedical Data Science Lab
Publication details: Scientific Reports 2023 (DOI: https://doi.org/10.1038/s41598-023-34863-9)
A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future
The increasing rates of breast cancer, particularly in emerging economies, have led to interest in scalable deep learning-based solutions that improve the accuracy and cost-effectiveness of mammographic screening. However, such tools require large volumes of high-quality training data, which can be challenging to obtain. This paper combines the experience of an AI startup with an analysis of the FAIR principles of the eight available datasets. It demonstrates that the datasets vary considerably, particularly in their interoperability, as each dataset is skewed towards a particular clinical use-case. Additionally, the mix of digital captures and scanned film compounds the problem of variability, along with differences in licensing terms, ease of access, labelling reliability, and file formats. Improving interoperability through adherence to standards such as the BIRADS criteria for labelling and annotation, and a consistent file format, could markedly improve access and use of larger amounts of standardized data. This, in turn, could be increased further by GAN-based synthetic data generation, paving the way towards better health outcomes for breast cancer.
AAII investigator: Paul Kennedy
AAII research lab: Biomedical Data Science Lab
Publication details: Scientific Reports 2023 (https://doi.org/10.1038/s41597-023-02430-6)
Data driven science for clinically actionable knowledge in diseases
Data-driven science has become a major decision-making aid for the diagnosis and treatment of disease. Computational and visual analytics enables effective exploration and sense making of large and complex data through the deployment of appropriate data science methods, meaningful visualisation and human-information interaction.
This edited volume covers state-of-the-art theory, method, models, design, evaluation and applications in computational and visual analytics in desktop, mobile and immersive environments for analysing biomedical and health data. The book is focused on data-driven integral analysis, including computational methods and visual analytics practices and solutions for discovering actionable knowledge in support of clinical actions in real environments.
By studying how data and visual analytics have been implemented into the healthcare domain, the book demonstrates how analytics influences the domain through improving decision making, specifying diagnostics, selecting the best treatments and generating clinical certainty.
AAII investigator: Paul Kennedy
AAII research lab: Biomedical Data Science Lab
Publication details: Nguyen Q.V., Simoff, S.J., Kennedy, P.J. and Catchpoole, D.R. (eds) 2024, Data Driven Science for Clinically Actionable Knowledge in Diseases, CRC Press, Boca Raton FL.
Inferring Actual Treatment Pathways from Patient Records
Objective: Treatment pathways are step-by-step plans outlining the recommended medical care for specific diseases; they get revised when different treatments are found to improve patient outcomes. Examining health records is an important part of this revision process, but inferring patients’ actual treatments from health data is challenging due to complex event-coding schemes and the absence of pathway-related annotations. The objective of this study is to develop a method for inferring actual treatment steps for a particular patient group from administrative health records — a common form of tabular healthcare data — and address several technique- and methodology-based gaps in treatment pathway-inference research.
Methods: We introduce Defrag, a method for examining health records to infer the real-world treatment steps for a particular patient group. Defrag learns the semantic and temporal meaning of healthcare event sequences, allowing it to reliably infer treatment steps from complex healthcare data. To our knowledge, Defrag is the first pathway-inference method to utilise a neural network (NN), an approach made possible by a novel, self-supervised learning objective. We also developed a testing and validation framework for pathway inference, which we use to characterise and evaluate Defrag’s pathway inference ability, establish benchmarks, and compare against baselines.
Results: We demonstrate Defrag’s effectiveness by identifying best-practice pathway fragments for breast cancer, lung cancer, and melanoma in public healthcare records. Additionally, we use synthetic data experiments to demonstrate the characteristics of the Defrag inference method, and to compare Defrag to several baselines, where it significantly outperforms non-NN-based methods.
Conclusions: Defrag offers an innovative and effective approach for inferring treatment pathways from complex health data. Defrag significantly outperforms several existing pathway-inference methods, but computationally-derived treatment pathways are still difficult to compare against clinical guidelines. Furthermore, the open-source code for Defrag and the testing framework are provided to encourage further research in this area.
AAII investigator: Paul Kennedy
AAII research lab: Biomedical Data Science Lab
Publication details: A. Caruana, M. Bandara, K. Musial, D. Catchpoole, Paul J. Kennedy, Inferring Actual Treatment Pathways from Patient Records, Journal of Biomedical Informatics. 148 (2023) 104554.
Bibliometric analysis of parasite vaccine research from 1990 to 2019
Bibliometric and bibliographic analyses are popular tools for investigating publication metrics and thematic transitions in an expanding codex of biomedical literature. Bibliometric techniques have been employed in parasitology and vaccinology, with only a few malaria-specific literature analyses being reported specifically on parasite vaccines. The pursuit of parasite prophylactics is an important, global endeavour both medically and economically. As such, a comprehensive understanding of the research topics would be a valuable tool in assessing the current status and future directions of parasite vaccine development. Consequently, this study investigated parasite vaccinology from 1990 to 2019 by analysing literature exported from the Web of Science and Dimensions databases using two, commonly used, bibliometric programs: SciMAT and VOSviewer. The results of this study show the common, emerging, and transient themes within the discipline, and where the future lies as vaccine development moves further into the age of omics and informatics.
AAII investigator: Paul Kennedy
AAII research lab: Biomedical Data Science Lab
Publication details: Louis de Couvreur, Manuel Jesus Cobo, Paul J Kennedy, John T Ellis, Bibliometric analysis of parasite vaccine research from 1990 to 2019, Vaccine, Volume 41, Issue 44, 20 October 2023, pages 6468-6477.
Machine learning for administrative health records: A systematic review of techniques and applications
Machine learning provides many powerful and effective techniques for analysing heterogeneous electronic health records (EHR). Administrative Health Records (AHR) are a subset of EHR collected for administrative purposes, and the use of machine learning on AHRs is a growing subfield of EHR analytics. Existing reviews of EHR analytics emphasise that the data-modality of the EHR limits the breadth of suitable machine learning techniques, and pursuable healthcare applications. Despite emphasising the importance of data modality, the literature fails to analyse which techniques and applications are relevant to AHRs. AHRs contain uniquely well-structured, categorically encoded records which are distinct from other data-modalities captured by EHRs, and they can provide valuable information pertaining to how patients interact with the healthcare system.
This paper systematically reviews AHR-based research, analysing 70 relevant studies and spanning multiple databases. We identify and analyse which machine learning techniques are applied to AHRs and which health informatics applications are pursued in AHR-based research. We also analyse how these techniques are applied in pursuit of each application, and identify the limitations of these approaches. We find that while AHR-based studies are disconnected from each other, the use of AHRs in health informatics research is substantial and accelerating. Our synthesis of these studies highlights the utility of AHRs for pursuing increasingly complex and diverse research objectives despite a number of pervading data- and technique-based limitations. Finally, through our findings, we propose a set of future research directions that can enhance the utility of AHR data and machine learning techniques for health informatics research.
AAII investigator: Paul Kennedy
AAII research lab: Biomedical Data Science Lab
Publication details: A. Caruana, M. Bandara, K. Musial, D. Catchpoole, Paul J. Kennedy, Machine learning for administrative health records: A systematic review of techniques and applications. Artificial Intelligence In Medicine (2023), Volume 144, October 2023, 102642. (DOI: https://doi.org/10.1016/j.artmed.2023.102642).
A state-of-the-art methodology for high-throughput in silico vaccine discovery against protozoan parasites and exemplified with discovered candidates for Toxoplasma gondii
Vaccine discovery against eukaryotic parasites is not trivial as highlighted by the limited number of known vaccines compared to the number of protozoal diseases that need one. Only three of 17 priority diseases have commercial vaccines. Live and attenuated vaccines have proved to be more effective than subunit vaccines but adversely pose more unacceptable risks. One promising approach for subunit vaccines is in silico vaccine discovery, which predicts protein vaccine candidates given thousands of target organism protein sequences. This approach, nonetheless, is an overarching concept with no standardised guidebook on implementation. No known subunit vaccines against protozoan parasites exist as a result of this approach, and consequently none to emulate. The study goal was to combine current in silico discovery knowledge specific to protozoan parasites and develop a workflow representing a state-of-the-art approach. This approach reflectively integrates a parasite’s biology, a host's immune system defences, and importantly, bioinformatics programs needed to predict vaccine candidates. To demonstrate the workflow effectiveness, every Toxoplasma gondii protein was ranked in its capacity to provide long-term protective immunity. Although testing in animal models is required to validate these predictions, most of the top ranked candidates are supported by publications reinforcing our confidence in the approach.
AAII investigator: Paul Kennedy
AAII research lab: Biomedical Data Science Lab
Publication details: Goodswen, S.J., Kennedy, P.J. & Ellis, J.T. A state-of-the-art methodology for high-throughput in silico vaccine discovery against protozoan parasites and exemplified with discovered candidates for Toxoplasma gondii. Sci Rep 13, 8243 (2023). (DOI: https://doi.org/10.1038/s41598-023-34863-9).
A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future
The increasing rates of breast cancer, particularly in emerging economies, have led to interest in scalable deep learning-based solutions that improve the accuracy and cost-effectiveness of mammographic screening. However, such tools require large volumes of high-quality training data, which can be challenging to obtain. This paper combines the experience of an AI startup with an analysis of the FAIR principles of the eight available datasets. It demonstrates that the datasets vary considerably, particularly in their interoperability, as each dataset is skewed towards a particular clinical use-case. Additionally, the mix of digital captures and scanned film compounds the problem of variability, along with differences in licensing terms, ease of access, labelling reliability, and file formats. Improving interoperability through adherence to standards such as the BIRADS criteria for labelling and annotation, and a consistent file format, could markedly improve access and use of larger amounts of standardized data. This, in turn, could be increased further by GAN-based synthetic data generation, paving the way towards better health outcomes for breast cancer.
AAII investigator: Paul Kennedy
AAII research lab: Biomedical Data Science Lab
Publication details: Joe Logan, Paul J. Kennedy, Daniel Catchpoole, A review of the machine learning datasets in mammography, their adherence to the FAIR principles, Scientific Data, 10:595, 2023.
Ventral and Dorsal Stream EEG Channels: Key Features for EEG-Based Object Recognition and Identification (HAI Centre)
Object recognition and object identification are multifaceted cognitive operations that require various brain regions to synthesize and process information. Prior research has evidenced the activity of both visual and temporal cortices during these tasks. Notwithstanding their similarities, object recognition and identification are recognized as separate brain functions. Drawing from the two-stream hypothesis, our investigation aims to understand whether the channels within the ventral and dorsal streams contain pertinent information for effective model learning regarding object recognition and identification tasks. By utilizing the data we collected during the object recognition and identification experiment, we scrutinized EEGNet models, trained using channels that replicate the two-stream hypothesis pathways, against a model trained using all available channels. The outcomes reveal that the model trained solely using the temporal region delivered a high accuracy level in classifying four distinct object categories. Specifically, the object recognition and object identification models achieved an accuracy of 89% and 85%, respectively. By incorporating the channels that mimic the ventral stream, the model’s accuracy was further improved, with the object recognition model and object identification model achieving an accuracy of 95% and 94%, respectively. Further- more, the Grad-CAM result of the trained models revealed a significant contribution from the ventral and dorsal stream channels toward the training of the EEGNet model. The aim of our study is to pinpoint the optimal channel configuration that provides a swift and accurate brain--computer interface system for object recognition and identification.
Investigators: Daniel Leong, Thomas (Tien-Thong) Do, CT Lin.
All authors are with GrapheneX-UTS Human-centric Artificial Intelligence Centre (HAI) and Australian Artificial Intelligence Institute (AAII).
Publication details: Leong D, Do T, Lin CT. Ventral and Dorsal Stream EEG Channels: Key Features for EEG-Based Object Recognition and Identification. IEEE Trans Neural Syst Rehabil Eng. (DOI: 10.1109/TNSRE.2023.3339698).
Fine-Grained Distillation for Long Document Retrieval
Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder. However, in contrast to passages or sentences, retrieval on long documents suffers from the scope hypothesis that a long document may cover multiple topics. This maximizes their structure heterogeneity and poses a granular-mismatch issue, leading to an inferior distillation efficacy. In this work, we propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers. While preserving the conventional dense retrieval paradigm, it first produces global-consistent representations crossing different fine granularity and then applies multi-granular aligned distillation merely during training. In experiments, we evaluate our framework on two long-document retrieval benchmarks, which show state-of-the-art performance.
AAII investigators: Yucheng Zhou, Guodong Long
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: AAAI 2024, “Fine-Grained Distillation for Long Document Retrieval”, AAAI 2024 (accepted by 11 Dec 2023)
Causal Reinforcement Learning: A Survey
Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty. Despite many remarkable achievements in recent decades, applying reinforcement learning methods in the real world remains challenging. One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world and must therefore learn from scratch through numerous trial-and-error interactions. They may also face challenges in providing explanations for their decisions and generalizing the acquired knowledge. Causality, however, offers a notable advantage as it can formalize knowledge in a systematic manner and leverage invariance for effective knowledge transfer. This has led to the emergence of causal reinforcement learning, a subfield of reinforcement learning that seeks to enhance existing algorithms by incorporating causal relationships into the learning process. In this survey, we comprehensively review the literature on causal reinforcement learning. We first introduce the basic concepts of causality and reinforcement learning, and then explain how causality can address core challenges in non-causal reinforcement learning. We categorize and systematically review existing causal reinforcement learning approaches based on their target problems and methodologies. Finally, we outline open issues and future directions in this emerging field.
AAII investigators: Zhihong Deng*, Jing Jiang, Guodong Long, Chengqi Zhang
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: “Causal Reinforcement Learning: A Survey” , TMLR 2024 (accepted by 21 Nov 2023)
Structured Federated Learning through Clustered Additive Modeling
Heterogeneous federated learning without assuming any structure is challenging due to the conflicts among non-identical data distributions of clients. In practice, clients often comprise near-homogeneous clusters so training a server-side model per cluster mitigates the conflicts. However, FL with client clustering often suffers from “clustering collapse”, i.e., one cluster’s model excels on increasing clients, and reduces to single-model FL. Moreover, cluster-wise models hinder knowledge sharing between clusters and each model depends on fewer clients. Furthermore, the static clustering assumption on data may not hold for dynamically changing models, which are sensitive to cluster imbalance/initialization or outliers. To address these challenges, we propose “Clustered Additive Modeling (CAM)”, which applies a globally shared model Θg on top of the cluster-wise models Θ1:K, i.e., y = h(x; Θg)+f(x; Θk) for clients of cluster-k. The global model captures the features shared by all clusters so Θ1:K are enforced to focus on the difference among clusters. To train CAM, we develop a novel Fed-CAM algorithm that alternates between client clustering and training global/cluster models to predict the residual of each other. We can easily modify any existing clustered FL methods by CAM and significantly improve their performance without “clustering collapse” in different non-IID settings. We also provide a convergence analysis of Fed-CAM algorithm.
AAII investigators: Jie Ma, Guodong Long, Jing Jiang, Chengqi Zhang
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: “Structured Federated Learning through Clustered Additive Modeling”, NeurIPS 2023
Is heterogeneity notorious? Taming heterogeneity to handle test-time shift in federated learning
Federated learning (FL) is an effective machine learning paradigm where multiple clients can train models based on heterogeneous data in a decentralized manner without accessing their private data. However, existing FL systems undergo performance deterioration due to feature-level test-time shifts, which are well investigated in centralized settings but rarely studied in FL. The common non-IID issue in FL usually refers to inter-client heterogeneity during training phase, while the test-time shift refers to the intra-client heterogeneity during test phase. To explore the possibility of using inter-client heterogeneity in handling intra-client heterogeneity, we firstly propose a contrastive learning-based FL framework, namely FedICON, to capture invariant knowledge among heterogeneous clients and consistently tune the model to adapt to test data. Extensive experiments validate the effectiveness of the proposed FedICON in taming heterogeneity to handle test-time shift problems.
AAII investigators: Yue Tan, Guodong Long
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: “Is heterogeneity notorious? Taming heterogeneity to handle test-time shift in federated learning”, NeurIPS 2023.
False Correlation Reduction for Offline Reinforcement Learning
Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems. Most existing papers only discuss defending against out-of-distribution (OOD) actions while we investigate a broader issue, the false correlations between epistemic uncertainty and decision-making, an essential factor that causes suboptimality. In this paper, we propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL). The proposed algorithm introduces an annealing behavior cloning regularizer to help produce a high-quality estimation of uncertainty which is critical for eliminating false correlations from suboptimality. Theoretically, we justify the rationality of the proposed method and prove its convergence to the optimal policy with a sublinear rate under mild assumptions.
AAII investigators: Zhihong Deng, Jing Jiang
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: “False Correlation Reduction for Offline Reinforcement Learning”, IEEE Trans. PAM (accepted on 25 Oct 2023)
Human-Guided Moral Decision Making in Text-based Games
Reinforcement learning (RL) in text-based games has developed rapidly and achieved promising results. However, little effort has been expended to design agents that pursue objectives while behaving morally, which is a critical issue in the field of autonomous agents. In this paper, we propose a general framework named Moral Awareness Adaptive Learning (MorAL) that enhances the morality capacity of an agent using a plugin moral-aware learning model. The framework allows the agent to execute task learning and morality learning adaptively. The agent selects trajectories from past experiences during task learning. Meanwhile, the trajectories are used to conduct self-imitation learning with a moral-enhanced objective. In order to achieve the trade-off between morality and task progress, the agent uses the combination of task policy and moral policy for action selection. We evaluate on the Jiminy Cricket benchmark, a set of text-based games with various scenes and dense morality annotations. Our experiments demonstrate that, compared with strong contemporary value alignment approaches, the proposed framework improves task performance while reducing immoral behaviours in various games.
AAII investigators: Zijing Shi, Ling Chen, Jun Wang
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: “Human-Guided Moral Decision Making in Text-based Games”, AAAI 2024 (accepted on 11 Dec 2023).
CITB: A Benchmark for Continual Instruction Tuning
Continual learning (CL) is a paradigm that aims to replicate the human ability to learn and accumulate knowledge continually without forgetting previous knowledge and transferring it to new tasks. Recent instruction tuning (IT) involves fine-tuning models to make them more adaptable to solving NLP tasks in general. However, it is still uncertain how instruction tuning works in the context of CL tasks. This challenging yet practical problem is formulated as Continual Instruction Tuning (CIT). In this work, we establish a CIT benchmark consisting of learning and evaluation protocols. We curate two long dialogue task streams of different types, InstrDialog and InstrDialog++, to study various CL methods systematically. Our experiments show that existing CL methods do not effectively leverage the rich natural language instructions, and fine-tuning an instruction-tuned model sequentially can yield similar or better results. We further explore different aspects that might affect the learning of CIT. We hope this benchmark will facilitate more research in this direction.
AAII investigators: Zihan Zhang, Ling Chen
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: “CITB: A Benchmark for Continual Instruction Tuning”, EMNLP 2023.
How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent Advances
Although large language models (LLMs) are impressive in solving various tasks, they can quickly be outdated after deployment. Maintaining their up-to-date status is a pressing concern in the current era. This paper provides a comprehensive review of recent advances in aligning LLMs with the ever-changing world knowledge without re-training from scratch. We categorize research works systemically and provide in-depth comparisons and discussion. We also discuss existing challenges and highlight future directions to facilitate research in this field. We release the paper list at https://github.com/hyintell/awesome-refreshing-llms
AAII investigators: Zihan Zhang, Ling Chen, Jun Wang
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: "How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent Advances”, EMNLP, 2023.
Turn-Level Active Learning for Dialogue State Tracking
Dialogue state tracking (DST) plays an important role in task-oriented dialogue systems. However, collecting a large amount of turn-by-turn annotated dialogue data is costly and inefficient. In this paper, we propose a novel turn-level active learning framework for DST to actively select turns in dialogues to annotate. Given the limited labelling budget, experimental results demonstrate the effectiveness of selective annotation of dialogue turns. Additionally, our approach can effectively achieve comparable DST performance to traditional training approaches with significantly less annotated data, which provides a more efficient way to annotate new dialogue data.
AAII investigators: Zihan Zhang, Ling Chen
AAII research lab: Data Science and Knowledge Discovery Lab (DSKD Lab)
Publication details: “Turn-Level Active Learning for Dialogue State Tracking”, EMNLP 2023.
Toward Autonomous Distributed Clustering
Compared to traditional centralized clustering, distributed clustering offers the advantage of parallel processing of data from different sites, enhancing the efficiency of clustering while preserving the privacy of the data at each site. However, most existing distributed clustering techniques require manual tuning of several parameters or hyperparameters, which can pose challenges for practical applications. This paper introduces a novel parameter-free distributed clustering framework known as distributed torque clustering (DTC). When dealing with data or subdata distributed across various sites, DTC predominantly executes two steps. The first step is a data reduction at each site using torque clustering, and the second step involves performing global clustering with weighted torque clustering. We compare DTC against six state-of-the-art distributed clustering algorithms and automatic centralized clustering techniques on ten large-scale or medium-scale datasets. The results show that the average rank of DTC is at least three times better than those of the other algorithms across all the datasets. Additionally, DTC can accurately predict the ground-truth number of clusters in nine out of ten datasets, further demonstrating its competitive performance and practical potential.
AAII Investigators: Jie Yang, CT Lin
AAII research lab: Computational Intelligence and Brain-Computer Interface Lab (CIBCI Lab)
Funding source: Australian Research Council (ARC) under discovery grant DP210101093 and discovery grant DP220100803
Publication details: IEEE Transactions on Emerging Topics in Computational Intelligence
Authors are also with the Human-centric Artificial Intelligence Centre (HAI)
Enhanced Adjacency-constrained Hierarchical Clustering using Fine-grained Pseudo Labels
Hierarchical clustering is able to provide partitions of different granularity levels. However, most existing hierarchical clustering techniques perform clustering in the original feature space of the data, which may suffer from overlap, sparseness, or other undesirable characteristics, resulting in noncompetitive performance. In the field of deep clustering, learning representations using pseudo labels has recently become a research hotspot. Yet most existing approaches employ coarse-grained pseudo labels, which may contain noise or incorrect labels. Hence, the learned feature space does not produce a competitive model. In this paper, we introduce the idea of fine-grained labels of supervised learning into unsupervised clustering, giving rise to the enhanced adjacency-constrained hierarchical clustering (ECHC) model. The full framework comprises four steps. One, adjacency-constrained hierarchical clustering (CHC) is used to produce relatively pure fine-grained pseudo labels. Two, those fine-grained pseudo labels are used to train a shallow multilayer perceptron to generate good representations. Three, the corresponding representation of each sample in the learned space is used to construct a similarity matrix. Four, CHC is used to generate the final partition based on the similarity matrix. The experimental results show that the proposed ECHC framework not only outperforms 14 shallow clustering methods on eight real-world datasets but also surpasses current state-of-the-art deep clustering models on six real-world datasets. In addition, on five real-world datasets, ECHC achieves comparable results to supervised algorithms.
AAII Investigators: Jie Yang, CT Lin
AAII research lab: Computational Intelligence and Brain-Computer Interface Lab (CIBCI Lab)
Funding source: Australian Research Council (ARC) under discovery grant DP210101093 and discovery grant DP220100803
Publication details: IEEE Transactions on Emerging Topics in Computational Intelligence
Authors are also with the Human-centric Artificial Intelligence Centre (HAI)
Online Boosting Adaptive Learning under Concept Drift for Multistream Classification
Multistream classification poses significant challenges due to the necessity for rapid adaptation in dynamic streaming processes with concept drift. Despite the growing research outcomes in this area, there has been a notable oversight regarding the temporal dynamic relationships between these streams, leading to the issue of negative transfer arising from irrelevant data. In this paper, we propose a novel Online Boosting Adaptive Learning (OBAL) method that effectively addresses this limitation by adaptively learning the dynamic correlation among different streams. Specifically, OBAL operates in a dual-phase mechanism, in the first of which we design an Adaptive Covariate Shift Adaptation (AdaCOSA) algorithm to construct an initialized ensemble model using archived data from various source streams, thus mitigating the covariate shift while learning the dynamic correlations via an adaptive re-weighting strategy. During the online process, we employ a Gaussian Mixture Model-based weighting mechanism, which is seamlessly integrated with the acquired correlations via AdaCOSA to effectively handle asynchronous drift. This approach significantly improves the predictive performance and stability of the target stream. We conduct comprehensive experiments on several synthetic and real-world data streams, encompassing various drifting scenarios and types. The results clearly demonstrate that OBAL achieves remarkable advancements in addressing multistream classification problems by effectively leveraging positive knowledge derived from multiple sources.
AAII Investigators: En Yu, Jie Lu, Bin Zhang and Guangquan Zhang
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Funding source: Australian Research Council (ARC) under Laureate project FL190100149 and Discovery Project DP200100700
Publication details: The 38th Annual AAAI Conference on Artificial Intelligence
Deep Reinforcement Learning in Nonstationary Environments With Unknown Change Points
Deep reinforcement learning (DRL) is a powerful tool for learning from interactions within a stationary environment where state transition and reward distributions remain constant throughout the process. Addressing the practical but challenging nonstationary environments with time-varying state transition or reward function changes during the interactions, ingenious solutions are essential for the stability and robustness of DRL agents. A key assumption to cope with nonstationary environments is that the change points between the previous and the new environments are known beforehand. Unfortunately, this assumption is impractical in many cases, such as outdoor robots and online recommendations. To address this problem, this article presents a robust DRL algorithm for nonstationary environments with unknown change points. The algorithm actively detects change points by monitoring the joint distribution of states and actions. A detection boosted, gradient-constrained optimization method then adapts the training of the current policy with the supporting knowledge of formerly well-trained policies. The previous policies and experience help the current policy adapt rapidly to environmental changes. Experiments show that the proposed method accumulates the highest reward among several alternatives and is the fastest to adapt to new environments. This work has compelling potential for increasing the environmental suitability of intelligent agents, such as drones, autonomous vehicles, and underwater robots.
AAII investigators: Zihe Liu, Jie Lu, Junyu Xuan and Guangquan Zhang
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Funding source: Australian Research Council (ARC) under Laureate project FL190100149 and Discovery Project DP200100700
Publication details: IEEE Transactions on Cybernetics
Domain Adaptation with Interval-valued Observations: Theory and Algorithms
Unsupervised Domain Adaptation (UDA) focuses on enhancing the model performance on an unlabeled target domain by leveraging knowledge from a source domain. The source and target domains usually share different distributions. Existing UDA research primarily concentrates on image data characterized by crisp-valued features. However, interval-valued data, where all the observations’ features are described by intervals, is also a common type of data in real-world scenarios. For instance, measurement instruments are unable to provide exact numerical outcomes, instead employing intervals to describe their results. Hence, this paper focuses on the highly challenging context known as domain adaptation with interval-valued observations. In this environment, the objective is to improve classification accuracy within an unlabeled target domain by capitalizing on knowledge gleaned from a labeled source domain, where both domains exclusively feature interval-valued observations. To address this, we first establish an upper bound on the risk in the interval-valued target domain, underpinning our analysis with rigorous theoretical insights. Subsequently, guided by our theoretical analysis, a new model based on Takagi-Sugeno Fuzzy rules and a Self-supervised Pseudo-labeling strategy (SP-TSF) is developed to address the proposed problem. Takagi-Sugeno fuzzy rules are harnessed to handle the inherent uncertainty intrinsic to interval-valued data, while a pseudo-labeling strategy is developed to augment distribution alignment between the source and target domains, each characterized by interval-valued observations. Extensive experiments on both synthetic and realworld datasets verify the rationality of our theoretical analysis and the efficacy of the proposed model.
AAII investigators: Guangzhi Ma; Jie Lu; Feng Liu; Zhen Fang; Guangquan Zhang
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Funding source: Australian Research Council (ARC) under Laureate project FL190100149
Publication details: IEEE Transactions on Fuzzy Systems
Multi-source Domain Adaptation with Interval-Valued Target Data via Fuzzy Neural Networks
Multi-source domain adaptation (MSDA) refers to the task of adapting a model from multiple source domains to a target domain that shares a different distribution with all source domains. However, most existing MSDA works focus on crispvalued data, while such data may not be available in some realworld scenarios. For example, data extracted by many measuring devices are not exact numbers but rather intervals. In this paper, a highly challenging problem called MSDA with interval-valued target data is presented. The objective is to learn a new model for interval-valued target data by leveraging knowledge from source models trained on multiple crisp-valued source data. First, a theoretical analysis is given to inform the appropriate combination of multi-source models. Then, we propose a new neural network model based on a fuzzy transformation function and fuzzy distances to address the proposed problem. The fuzzy transformation function is applied to extract valuable crisp-valued information from interval-valued target data, while fuzzy distances are designed to guide the fusion of multiple source models. Experiments on both synthetic and real-world datasets verify the superiority of our proposed MSDA method for classification task. Furthermore, the results of the ablation study and parameter sensitivity analysis illustrate the rationality of the proposed fuzzy distance-based model.
AAII investigators: Guangzhi Ma; Jie Lu; Guangquan Zhang
AAII research lab: Decision Systems and e-Service Intelligence Lab (DeSI Lab)
Funding source: Australian Research Council (ARC) under Laureate project FL190100149
Publication details: IEEE Transactions on Fuzzy Systems