In enterprise systems and scientific research, the exploration of foundation models in complex relational database (RDB) scenarios remains in its early stages.
This is because multi-table interactions and heterogeneous features within RDBs make it difficult for traditional general-purpose large language models to function effectively in such structured environments.
To address this, a team from Peking University led by Zhang Muhhan, in collaboration with Amazon Web Services (AWS), has proposed Griffin: an innovative, graph-centric foundation model designed specifically for RDBs.
Griffin treats RDBs as dynamic heterogeneous graphs for modeling and reasoning. By pre-training on over 150 million rows of tabular data and performing supervised fine-tuning, it builds a foundation model with strong transferability and generalization capabilities. These findings have been formally accepted by ICML 2025, an international top-tier conference.

The Challenge: Complex Inter-table Relationships and Rich Intra-table Semantic Information
Relational databases define data structures through explicit schemas and serve critical sectors such as finance, e-commerce, scientific research, logistics, and government information systems. They are a core digital infrastructure in modern society.
Market forecasts predict that the global Database Management System (DBMS) market will exceed $133 billion by 2028.
However, intelligent modeling of RDBs faces significant challenges, primarily concentrated in three areas:
- Highly Complex Topological Structures Data is stored across multiple tables and connected via constraints such as primary keys and foreign keys, forming complex graph structures. Traditional single-table paradigms struggle to capture global context.
- Highly Heterogeneous Features Table fields encompass various types, including text, numerical values, categories, and time series. The diverse forms of information require models to possess unified representation capabilities.
- Deep Semantic Relationships Rich explicit and implicit logical relationships exist both within and between tables, posing a substantial challenge to the model’s ability to understand and reason about relations.

The image above illustrates a typical RDB. The green “Purchase Table” records transaction data (each row includes User ID, Product ID purchased, user rating for the product, and purchase date). Each row can be linked via the foreign key “User ID” to the corresponding row in the “User Table,” or via the foreign key “Item ID” to the “Product Table,” allowing access to specific user or product information.
Compared to ordinary (single-table) data, RDBs often feature very complex inter-table relationships and rich intra-table semantic information, posing challenges for modeling and foundation model training. Furthermore, the community has long lacked standardized benchmarks that accurately reflect production scenarios.
Datasets such as 4DBInfer (arXiv:2404.18209) are slowly filling this gap, providing a unified evaluation platform for new models, including Griffin.
Methodology: Graph-Centric Database Modeling

The core idea of Griffin is to abstract the entire relational database as a temporal heterogeneous graph, performing unified encoding, message passing, and decoding on this graph to capture deep dependencies across tables and time. Specifically, its innovative design can be broken down into the following points:
RDB Data Modeling: Structured Graph Representation and Temporal Awareness
First, Griffin maps each record in a data table to a node in the graph, while primary key-foreign key (PK-FK) constraints are modeled as directed edges with types. This transforms records scattered across multiple tables into a heterogeneous graph, where node/edge types naturally reflect schema information.
To prevent future information leakage and adhere to causal constraints for production prediction tasks, the model samples “local temporal subgraphs” around target nodes during training and inference: it only includes neighbors with timestamps earlier than the target node.
This sampling process draws on established practices from benchmarks like 4DBInfer, explicitly injecting time direction while ensuring efficiency.
Unified Data Encoder: Standardized Representation of Heterogeneous Information
RDBs contain multimodal features such as text/category fields alongside numerical and time-series data. Griffin designs a unified encoding mechanism that converts different types into vectors within the same semantic space:
- Categories & Text: Category values are first mapped to their natural language descriptions, then input along with native text into a pre-trained text encoder (e.g., Nomic Embeddings) to obtain high-dimensional embeddings rich in semantics.
- Numerical Values: Normalized numerical inputs are fed into a pre-trained float encoder (ENC). ENC and its paired decoder (DEC) are trained via a joint reconstruction task: the encoded data must be able to decode back to the original float values without loss. Once the reconstruction error is minimized, the parameters of these two components are frozen.
- Metadata & Task Context: Table names, column names, and edge types are also sent to the text encoder. Additionally, a task description generated based on the current prediction target column participates in attention calculations at all subsequent layers, guiding the model to focus on the target.
Through these steps, original multi-modal information is standardized into a set of high-semantic vectors, laying the foundation for subsequent graph message passing.
Advanced MPNN Architecture: Deep Relational Reasoning Network
The uniformly encoded graph is fed into Griffin’s custom Message Passing Neural Network (MPNN), which consists of two complementary modules:
Cross-Attention Column-wise Aggregation: For each node, the model generates query vectors using the current node embedding and task embedding. These interact with column metadata and features to dynamically assess the importance of different columns for the current task and perform weighted aggregation. This design naturally satisfies column permutation invariance and can handle tables with varying numbers of columns.
Hierarchical Aggregation Cross-table Reasoning: At each layer of message passing, neighbor messages of the same edge type are aggregated via mean pooling first, followed by max pooling across different edge types. This two-stage hierarchical strategy enhances stability when processing inter-table associations characterized by complex topological structures and varying numbers of neighbors.
Through multi-layer iteration, the MPNN captures composite dependencies from local to remote nodes, providing information-rich node representations for downstream tasks.
Unified Task Decoder: Integrated Solution for Multi-task Outputs
The node vectors output by the MPNN then enter a unified decoder, enabling Griffin to handle multiple prediction tasks simultaneously without altering its architecture.
- Classification Tasks: The text embeddings of candidate category labels serve as learnable dynamic classification heads. An inner product with the node vector yields a probability distribution, allowing extension to tasks with variable numbers of classes.
- Regression Tasks: Node vectors are directly input into the pre-trained DEC to reverse-engineer and obtain the final predicted numerical values.
Training: Three-Stage Optimization Scheme
Griffin employs a three-stage pipeline of “Self-supervised Pre-training → Joint Supervised Fine-Tuning → Downstream Task Fine-tuning” to gradually inject capabilities ranging from general tabular semantics to specific RDB task knowledge.
Phase 1: Completion Pretraining
Griffin first performs self-supervised learning on massive and diverse single-table datasets, with tasks similar to “cloze tests.” The model predicts the embeddings of masked units based on known column information in a data row, minimizing the cosine distance between predicted and true embeddings. This establishes a foundational understanding of table structure and semantics.
Phase 2: Joint Supervised Fine-Tuning (SFT)
After self-supervised pretraining, Griffin uses datasets from single-table or RDB tasks for supervised fine-tuning, further aligning the model with prediction needs and data characteristics in real-world scenarios.
Phase 3: Downstream Task Fine-tuning
Finally, Griffin, having undergone pre-training and SFT, undergoes refined fine-tuning on specific downstream RDB benchmark tasks to achieve optimal performance in particular application scenarios.
Validation: Superiority of Three-Stage Training
To comprehensively evaluate the specific contributions of each training stage to model performance, three key variants of Griffin were analyzed in depth: Griffin-unpretrained (using only the base architecture without any pre-training), Griffin-pretrained (undergoing only single-table pre-training and single-table SFT), and Griffin-RDB-SFT (experiencing the complete three-stage training process).

The image above compares the performance of four GNN baseline models, four single-table baselines using DFS (Deep Feature Synthesis), and two Griffin variants. Each model was fine-tuned on a single task.
The leftmost subplot shows the average ranking across all tasks, while other subplots group tasks by evaluation metrics with corresponding averages.
Systematic experiments validated the effectiveness of Griffin’s architectural design and pre-training strategy. Griffin performed excellently in multiple RDB benchmarks (such as 4DBInfer and RelBench). Further analysis examined its cross-task transferability in few-shot scenarios and the impact of data domain relationships.

The core advantages of Griffin can be summarized as follows:
1. Powerful Base Architecture Performance
Even without pre-training (Griffin-unpretrained), thanks to designs such as unified encoding, cross-attention, and hierarchical MPNN, the model’s performance after fine-tuning on downstream RDB tasks still outperforms GNN baselines and traditional single-table models combined with Deep Feature Synthesis (DFS), demonstrating the advancement of its architecture itself.
2. Universal Gains from Single-Table Pre-training
Griffin-pretrained, which underwent pre-training only on large-scale, diverse single-table data, achieved performance improvements compared to the unpretrained version. This validates that knowledge learned in single-table scenarios can be transferred to complex RDB tasks, enhancing model generalization.
3. Transfer Driven by RDB-SFT
When further fine-tuned with targeted RDB data (Griffin-RDB-SFT), the model demonstrates cross-task transfer capabilities under certain conditions, particularly prominent in few-shot scenarios. This depends on two factors:
- Data Similarity: If SFT data shares high similarity with the target task domain (e.g., cross-task transfer within the e-commerce sector), model performance improves.
- Data Diversity: Training on more diverse SFT data (e.g., using mixed data from multiple other domains such as sports, social media, and healthcare for SFT before transferring to an e-commerce task) also effectively boosts model performance.
Paper Link: https://arxiv.org/abs/2505.05568
Code Link: https://github.com/yanxwb/griffin