Introducing RelGT-AC: Graph Transformers for Relational Database Autocomplete — AppSofa Blog

The Problem

Relational databases hold the most valuable structured data in any organization, yet standard machine learning workflows treat each table in isolation. A customer churn model trained on a customers table ignores the rich relational context in orders, support_tickets, and payments.

Autocomplete — predicting missing cell values in a database — is a concrete instance of this problem. A model must understand not just the row in question, but the network of related records connected through foreign keys.

Our Approach: RelGT-AC

RelGT-AC represents a relational database as a heterogeneous graph — tables are node types, foreign key relationships are edges, and rows are individual nodes. We then apply graph transformer attention across this structure to predict missing values.

Unlike message-passing GNNs that propagate information hop-by-hop, graph transformers apply attention directly across all nodes, enabling the model to capture long-range relational dependencies without over-squashing information through narrow bottlenecks.

Three Key Innovations

Column Masking

During subgraph encoding, we mask the target column to prevent trivial look-up — forcing the model to infer the missing value from relational context rather than directly reading it from the input.

Unified Task Head

A single prediction head that handles binary classification, multiclass classification, and regression — enabling one model to serve all autocomplete task types without architecture changes.

TF-IDF Text Encoder

Free-text columns carry lexical signals that pure embedding models miss. We encode these with TF-IDF features before the graph transformer, giving the model semantic understanding of unstructured text within the relational schema.

Results

We evaluated RelGT-AC on seven tasks across three RelBench v2 datasets — a benchmark suite for relational machine learning covering real-world business databases.

+10 AUROC

on text-heavy eligibility tasks vs. GraphSAGE

3 / 3

regression tasks where RelGT-AC outperforms GraphSAGE baseline

The gains are largest on tasks with rich text columns — confirming that the TF-IDF encoder is capturing meaningful lexical signals. On regression tasks, the unified task head and column masking together enable RelGT-AC to leverage cross-table relational context that GraphSAGE's local aggregation cannot reach.

What's Next

RelGT-AC is a foundation — the same architecture applies to any ML task on relational data, not just autocomplete. Future work will explore:

Scaling to larger databases with millions of rows via sparse attention
Pre-training on diverse relational schemas (foundation model direction)
Integration with our enterprise Knowledge Graph and Ontology Platform services
Applying the approach to federal agency data environments

Read the Full Paper

RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases

Phillip Jiang · arXiv 2606.03040 · 2026

Read on arXiv Research Area

Back to Blog