Posts by Collection

portfolio

How Expressive are Knowledge Graph Foundation Models? Permalink

Published: February 24, 2025

Our paper ‘How Expressive are Knowledge Graph Foundation Models?’ is now available on arXiv. Check it out for a detailed analysis of the expressive power of KGFMs.

‘Link Prediction with Relational Hypergraphs’ is on arXiv now!

‘Cooperative graph neural networks’ got accepted by ICML 2024!

‘A Novel Multiobjective Genetic Programming Approach to High-Dimensional Data Classification’ got accepted by IEEE Transactions on Cybernetics!

‘One Model, Any Conjunctive Query: Graph Neural Networks for Answering Complex Queries over Knowledge Graphs’ is on arXiv now!

‘A Theory of Link Prediction via Relational Weisfeiler-Leman on Knowledge Graph’ got accepted by NeurIPS 2023!

‘Theoretical Insights into Line Graph Transformation on Graph Learning’ got accepted by NeurIPS 2024 Workshop of Symmetry and Geometry in Neural Representations!

publications

An empirical study on two-child policy in china based on statistical analysis and machine learning

Published in Proceedings of the 2nd International Conference on Social Science, Public Health and Education (SSPHE 2018), 2019

Since the universal two-child policy (TCP) in China is launched in 2016, many researchers have dedicated their efforts into investigating the influences from the society point of view. In this paper, we look at this issue from a different angle, trying to investigate how the factors influence whether an expectant mother would bore a second child in China empirically. The real-world data from both rural and city regions are used to train an imbalance classification model. In addition, some statistical hypotheses are also made to justify the relevance of these factors. Experimental results demonstrate the validity and effectiveness of our trained model.

Recommended citation: Yizhou Chi and Xingyue Huang and Yu Zhou, An Empirical Study on Two-child Policy in China Based on Statistical Analysis and Machine Learning, Proceedings of the 2nd International Conference on Social Science, Public Health and Education (SSPHE 2018)
Download Paper | Download Slides

Feature selection of high dimensional data by adaptive potential particle swarm optimization

Published in IEEE Congress on Evolutionary Computation, 2019

For high-dimensional data classification, feature selection has played a significant role in removing the redundant or irrelevant features and improving the performances of classifiers. Particle Swarm Optimization (PSO), an evolutionary computational tool, has been widely and successfully applied in the field of feature selection. A recent pioneer work, Potential PSO (PPSO) which utilizes feature discretization process based on cut-points, provides better overall performance in classification and lower amount of selected features compared with some existing approaches. In this paper, we proposed an adaptive PPSO (APPSO) to reduce the number of features further and increase the classification accuracy. In APPSO, the pre-filtering method, ReliefF, is incorporated into the discretization process to help screen the potential cut-points effectively, and a new updating strategy is proposed to strengthen the searching scope of particles and avoid being trapped into local optima. Also, APPSO follows an adaptive way to alter the random level of the selection of cutpoints. The random level depends on the mean instance number per class of each dataset. Experiments on 10 datasets are carried out, and APPSO is proven to select less than 2% of the number of features while maintaining a very competitive accuracy compared with some state-of-the-art methods.

Recommended citation: X. Huang, Y. Chi and Y. Zhou, "Feature Selection of High Dimensional Data by Adaptive Potential Particle Swarm Optimization," 2019 IEEE Congress on Evolutionary Computation (CEC), Wellington, New Zealand, 2019, pp. 1052-1059, doi: 10.1109/CEC.2019.8790366.
Download Paper | Download Slides

A Theory of Link Prediction via Relational Weisfeiler-Leman on Knowledge Graphs

Published in NeurIPS, 2023

Graph neural networks are prominent models for representation learning over graph-structured data. While the capabilities and limitations of these models are well-understood for simple graphs, our understanding remains incomplete in the context of knowledge graphs. Our goal is to provide a systematic understanding of the landscape of graph neural networks for knowledge graphs pertaining to the prominent task of link prediction. Our analysis entails a unifying perspective on seemingly unrelated models and unlocks a series of other models. The expressive power of various models is characterized via a corresponding relational Weisfeiler-Leman algorithm. This analysis is extended to provide a precise logical characterization of the class of functions captured by a class of graph neural networks. The theoretical findings presented in this paper explain the benefits of some widely employed practical design choices, which are validated empirically.

Recommended citation: Xingyue Huang, Miguel Romero Orth, İsmail İlkan Ceylan, and Pablo Barceló. A theory of link prediction via relational weisfeiler-leman on knowledge graphs. In NeurIPS, 2023.
Download Paper | Download Slides

Link Prediction with Relational Hypergraphs

Published in arXiv, 2024

Link prediction with knowledge graphs has been thoroughly studied in graph machine learning, leading to a rich landscape of graph neural network architectures with successful applications. Nonetheless, it remains challenging to transfer the success of these architectures to link prediction with relational hypergraphs. The presence of relational hyperedges makes link prediction a task between k nodes for varying choices of k, which is substantially harder than link prediction with knowledge graphs, where every relation is binary (k = 2). In this paper, we propose two frameworks for link prediction with relational hypergraphs and conduct a thorough analysis of the expressive power of the resulting model architectures via corresponding relational Weisfeiler-Leman algorithms, and also via some natural logical formalisms. Through extensive empirical analysis, we validate the power of the proposed model architectures on various relational hypergraph benchmarks. The resulting model architectures substantially outperform every baseline for inductive link prediction, and lead to state-of-the-art results for transductive link prediction. Our study therefore unlocks applications of graph neural networks to fully relational structures.

Recommended citation: X Huang, MR Orth, P Barceló, MM Bronstein, İİ Ceylan, Link Prediction with Relational Hypergraphs, arXiv preprint arXiv:2402.04062, 2024
Download Paper | Download Slides

Theoretical Insights into Line Graph Transformation on Graph Learning

Published in NeurIPS 24 Workshop of Symmetry and Geometry in Neural Representations, 2024

Line graph transformation has been widely studied in graph theory, where each node in a line graph corresponds to an edge in the original graph. This has inspired a series of graph neural networks (GNNs) applied to transformed line graphs, which have proven effective in various graph representation learning tasks. However, there is limited theoretical study on how line graph transformation affects the expressivity of GNN models. In this study, we focus on two types of graphs known to be challenging to the Weisfeiler-Leman (WL) tests: Cai-F¨urer-Immerman (CFI) graphs and strongly regular graphs, and show that applying line graph transformation helps exclude these challenging graph properties, thus potentially assist WL tests in distinguishing these graphs. We empirically validate our findings by conducting a series of experiments that compare the accuracy and efficiency of graph isomorphism tests and GNNs on both line-transformed and original graphs across these graph structure types.

Recommended citation: F Yang, X Huang, Theoretical Insights into Line Graph Transformation on Graph Learning, arXiv preprint arXiv:2410.16138, 2024
Download Paper | Download Slides

A Novel Multiobjective Genetic Programming Approach to High-Dimensional Data Classification

Published in IEEE Transactions on Cybernetics, 2024

The development of data sensing technology has generated a vast amount of high-dimensional data, posing great challenges for machine learning models. Over the past decades, despite demonstrating its effectiveness in data classification, genetic programming (GP) has still encountered three major challenges when dealing with high-dimensional data: 1) solution diversity; 2) multiclass imbalance; and 3) large feature space. In this article, we have developed a problem-specific multiobjective GP framework (PS-MOGP) for handling classification tasks with high-dimensional data. To reduce the large solution space caused by high dimensionality, we incorporate the recursive feature elimination strategy based on mining the archive of evolved GP solutions. A progressive domination Pareto archive evolution strategy (PD-PAES), which optimizes the objectives in a specific order according to their objectives, is proposed to evaluate the GP individuals and maintain a better diversity of solutions. Besides, to address the seriously imbalanced class issue caused by traditional binary decomposition (BD) one versus rest (OVR) for multiclass classification problems, we design a method named BD with a similar positive and negative class size (BD-SPNCS) to generate a set of auxiliary classifiers. Experimental results on benchmark and real-world datasets demonstrate that our proposed PS-MOGP outperforms state-of-the-art traditional and evolutionary classification methods in the context of high-dimensional data classification.

Recommended citation: Y. Zhou, N. Yang, X. Huang, J. Lee and S. Kwong, "A Novel Multiobjective Genetic Programming Approach to High-Dimensional Data Classification," in IEEE Transactions on Cybernetics, doi: 10.1109/TCYB.2024.3372070.
Download Paper | Download Slides

Cooperative graph neural networks

Published in ICML, 2024

Graph neural networks are popular architectures for graph machine learning, based on iterative computation of node representations of an input graph through a series of invariant transformations. A large class of graph neural networks follow a standard message-passing paradigm: at every layer, each node state is updated based on an aggregate of messages from its neighborhood. In this work, we propose a novel framework for training graph neural networks, where every node is viewed as a player that can choose to either ‘listen’, ‘broadcast’, ‘listen and broadcast’, or to ‘isolate’. The standard message propagation scheme can then be viewed as a special case of this framework where every node ‘listens and broadcasts’ to all neighbors. Our approach offers a more flexible and dynamic message-passing paradigm, where each node can determine its own strategy based on their state, effectively exploring the graph topology while learning. We provide a theoretical analysis of the new message-passing scheme which is further supported by an extensive empirical analysis on a synthetic dataset and on real-world datasets.

Recommended citation: B Finkelshtein, X Huang, M Bronstein, İİ Ceylan, Cooperative graph neural networks, arXiv preprint arXiv:2310.01267, 2023
Download Paper | Download Slides

One Model, Any Conjunctive Query: Graph Neural Networks for Answering Complex Queries over Knowledge Graphs

Published in arXiv, 2024

Traditional query answering over knowledge graphs – or broadly over relational data – is one of the most fundamental problems in data management. Motivated by the incompleteness of modern knowledge graphs, a new setup for query answering has emerged, where the goal is to predict answers that do not necessarily appear in the knowledge graph, but are present in its completion. In this work, we propose AnyCQ, a graph neural network model that can classify answers to any conjunctive query on any knowledge graph, following training. At the core of our framework lies a graph neural network model trained using a reinforcement learning objective to answer Boolean queries. Our approach and problem setup differ from existing query answering studies in multiple dimensions. First, we focus on the problem of query answer classification: given a query and a set of possible answers, classify these proposals as true or false relative to the complete knowledge graph. Second, we study the problem of query answer retrieval: given a query, retrieve an answer to the query relative to the complete knowledge graph or decide that no correct solutions exist. Trained on simple, small instances, AnyCQ can generalize to large queries of arbitrary structure, reliably classifying and retrieving answers to samples where existing approaches fail, which is empirically validated on new and challenging benchmarks. Furthermore, we demonstrate that our AnyCQ models effectively transfer to out-of-distribution knowledge graphs, when equipped with a relevant link predictor, highlighting their potential to serve as a general engine for query answering.

Recommended citation: K Olejniczak, X Huang, İİ Ceylan, M Galkin, One Model, Any Conjunctive Query: Graph Neural Networks for Answering Complex Queries over Knowledge Graphs, arXiv preprint arXiv:2409.13959, 2024
Download Paper | Download Slides

How Expressive are Knowledge Graph Foundation Models?

Published in arXiv, 2025

Knowledge Graph Foundation Models (KGFMs) are at the frontier for deep learning on knowledge graphs (KGs), as they can generalize to completely novel knowledge graphs with different relational vocabularies. Despite their empirical success, our theoretical understanding of KGFMs remains very limited. In this paper, we conduct a rigorous study of the expressive power of KGFMs. Specifically, we show that the expressive power of KGFMs directly depends on the motifs that are used to learn the relation representations. We then observe that the most typical motifs used in the existing literature are binary, as the representations are learned based on how pairs of relations interact, which limits the models expressiveness. As part of our study, we design more expressive KGFMs using richer motifs, which necessitate learning relation representations based on, e.g., how triples of relations interact with each other. Finally, we empirically validate our theoretical findings, showing that the use of richer motifs results in better performance on a wide range of datasets drawn from different domains.

Recommended citation: X Huang, P Barceló, MM Bronstein, İİ Ceylan, M Galkin, JL Reutter, MR Orth, How Expressive are Knowledge Graph Foundation Models?, arXiv preprint arXiv:2502.13339, 2025
Download Paper | Download Slides

talks

teaching

Graph Representation Learning

Master course, Department of Computer Science, University of Oxford, 2023

Practical demostrator for the course Graph representation learning in Michaelmas term 2024, taught by Dr. İsmail İlkan Ceylan.

Geometric Deep Learning

Master course, Department of Computer Science, University of Oxford, 2024

Practical demostrator for the course Geometric Deep Learning in Hilary term 2024, taught by Professor Michael M. Bronstein.