Blog/Research
Research June 1, 2026 6 min read

Introducing RelGT-AC: Graph Transformers for Relational Database Autocomplete

We published our first research paper — RelGT-AC extends graph transformer architecture to autocomplete prediction tasks in relational databases, achieving up to +10 AUROC over GraphSAGE baselines on text-heavy tasks.

PJ
Phillip Jiang

The Problem

Relational databases hold the most valuable structured data in any organization, yet standard machine learning workflows treat each table in isolation. A customer churn model trained on a customers table ignores the rich relational context in orders, support_tickets, and payments.

Autocomplete — predicting missing cell values in a database — is a concrete instance of this problem. A model must understand not just the row in question, but the network of related records connected through foreign keys.

Our Approach: RelGT-AC

RelGT-AC represents a relational database as a heterogeneous graph — tables are node types, foreign key relationships are edges, and rows are individual nodes. We then apply graph transformer attention across this structure to predict missing values.

Unlike message-passing GNNs that propagate information hop-by-hop, graph transformers apply attention directly across all nodes, enabling the model to capture long-range relational dependencies without over-squashing information through narrow bottlenecks.

Three Key Innovations

01

Column Masking

During subgraph encoding, we mask the target column to prevent trivial look-up — forcing the model to infer the missing value from relational context rather than directly reading it from the input.

02

Unified Task Head

A single prediction head that handles binary classification, multiclass classification, and regression — enabling one model to serve all autocomplete task types without architecture changes.

03

TF-IDF Text Encoder

Free-text columns carry lexical signals that pure embedding models miss. We encode these with TF-IDF features before the graph transformer, giving the model semantic understanding of unstructured text within the relational schema.

Results

We evaluated RelGT-AC on seven tasks across three RelBench v2 datasets — a benchmark suite for relational machine learning covering real-world business databases.

+10 AUROC
on text-heavy eligibility tasks vs. GraphSAGE
3 / 3
regression tasks where RelGT-AC outperforms GraphSAGE baseline

The gains are largest on tasks with rich text columns — confirming that the TF-IDF encoder is capturing meaningful lexical signals. On regression tasks, the unified task head and column masking together enable RelGT-AC to leverage cross-table relational context that GraphSAGE's local aggregation cannot reach.

What's Next

RelGT-AC is a foundation — the same architecture applies to any ML task on relational data, not just autocomplete. Future work will explore:

  • Scaling to larger databases with millions of rows via sparse attention
  • Pre-training on diverse relational schemas (foundation model direction)
  • Integration with our enterprise Knowledge Graph and Ontology Platform services
  • Applying the approach to federal agency data environments

Read the Full Paper

RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases

Phillip Jiang · arXiv 2606.03040 · 2026