Document Classification with Graph Neural Networks

Bofin Babu
1 min readJun 27, 2021

Summary: You can construct a document graph, then apply a GNN to learn document embeddings that will be fed into a softmax layer to produce a probabilistic distribution over the set of class labels.

Typical pipeline for document classification with GNNs. Graph Neural Networks. NLP.
Typical pipeline for document classification with GNNs

Step 1 — Graph Construction: Usual approach is to construct a single heterogeneous graph for the whole corpus containing word nodes and document nodes, and connect the nodes with edges based on word co-occurrence and document-word relations. For short text and non text sources you can enrich the semantics with additional information such as topics and named entities.

Step 2 — Graph Representation Learning: Encode graph structure in a way that it can be easily exploited by machine learning models. You can use various GNN models like Graph CNNs, TensorGCN, GGNN (gated graph neural network) etc.

Step 3 — Node Embedding Initialization: Mapping nodes to low-dimensional embeddings. You can use pre-trained word embeddings.

--

--

Bofin Babu

Cofounder @ CloudSEK. Past: NVIDIA. Applying AI for Cybersecurity.