Voxels to Graph

This project explores the key contributions of Building-GAN by AutodeskAILab: 1) a novel 3D representation called voxel graph that can encode irregular voxel grids with non-uniform space partitioning, overcoming limitations of traditional regular voxel grids, and 2) a graph-conditioned generative adversarial network (GAN) leveraging graph neural networks (GNNs). The implementation uses their dataset to study these approaches. The structure of the raw data is examined first in the following section.

In the Data Collection part of the paper, it says that the dataset was created as follows:

Since there is no publicly available dataset for volumetric designs from real buildings, we create a synthetic dataset with 120,000 volumetric designs for commercial buildings using parametric models. The heuristics behind the parametric models are based on the rules and knowledge provided by professional architects. Although these parametric models are able to explore possible volumetric designs, they are not capable of fitting the constraints. Therefore, we generate the designs first and then compute the voxel graph, program graph, FAR, and TPR for each design. (...) Here, we consider 6 program types: lobby/corridor, restroom, stairs, elevator, office, and mechanical room. Each program node feature includes the program type and the story level.

Each datum of the created dataset consists of three JSONs as follows:

graph_global_*.json: contains the FAR (Floor Area Ratio), site area (Total land area to build a building), program ratios, and program types (6 program types described above + void type).


    {
        "far": 3.1144640998959425,
        "site_area": 961,
        "global_node": [
            {
                "type": 3,
                "proportion": 0.04811226194453724,
            
                (...)
            },
            {
                "type": 0,
                "proportion": 0.25526227865018364,
            
                (...)
            },

            (...)
        ]
    }

graph_global_*.json

graph_local_*.json: contains the nodes. Each node has the following attributes: floor \(\text{F}\) (the floor level on which the node is placed), program type \(\text{T}\), index \(\text{I}\), and neighbors that indicate connectivity between nodes. In the local graph, the unique index of each node is created as \(\text{[F, T, I]}\), and the neighbors contain these unique indices. In the example below, the first node's unique index is \(\text{[0, 3, 0]}\), and it has \(\text{[1, 3, 0]}\) as a neighbor. Similarly, the node with index \(\text{[1, 3, 0]}\) has \(\text{[0, 3, 0]}\) as a neighbor.
```
    {
        "node": [
            {
                "floor": 0,
                "type": 3,
                "type_id": 0,
                "neighbors": [[0, 0, 0], [1, 3, 0]]
            
                (...)
            },
            {
                "floor": 1,
                "type": 3,
                "type_id": 0,
                "neighbors": [[1, 0, 0], [2, 3, 0], [0, 3, 0]]
            
                (...)
            },
            
            (...)
        ]
    }
```
graph_local_*.json

voxel_*.json: contains the voxel nodes. Each voxel node has the following attributes: location (which serves as a unique index in the voxel graph), program type, dimension (the depth, height, and width of an irregular voxel), coordinate (the center of the voxel), and neighbors that indicate connectivity between nodes.


    {
        "voxel_node": [
            {
                "location": [0, 2, 7],
                "type": 3,
                "coordinate": [0.0, 20.0, 37.0],
                "dimension": [7.0, 3.0, 3.0],
                "neighbors": [[1, 2, 7], [0, 1, 7], [0, 3, 7], [0, 2, 6]]
            
                (...)
            },
            {
                "location": [1, 2, 7],
                "type": 3,
                "coordinate": [7.0, 20.0, 37.0],
                "dimension": [4.0, 3.0, 3.0],
                "neighbors": [[0, 2, 7], [2, 2, 7], [1, 1, 7], [1, 3, 7], [1, 2, 6]]
            
                (...)
            },
            
            (...)
        ]
    }

voxel_*.json

A Graph data structure consists of vertices and edges \(G(V, E)\). The vertices are sometimes also referred to as nodes and the edges are lines or arcs that connect any two nodes in the graph. The graph data structure can be represented as an adjacency list (or matrix) to indicate connectivity between nodes.

To convert voxel data into a graph structure, an index must be assigned to each voxel. The index of each voxel has an \(\text{(x, y, z)} \)-shaped structure and is unique, so that it can be converted into an \(\text{(0, 1, 2,} \cdot \cdot \cdot \text{, n)} \)-shaped structure. Because each voxel is treated as a node, the last index of the node is the number of voxels minus one.

Since a voxel is a cuboid-shaped geometry, connectivity between voxel nodes can be determined by 3D 6-way connectivity. Consider the uppermost voxel with index \(319\) in the right diagram of the figure below. Among the six faces of this voxel, only three (right, front, and bottom) are connected to adjacent voxels. The other faces do not touch any neighboring voxels. According to the 3D 6-way connectivity system, the node with index \(319\) has neighbor nodes with indices \(255\), \(311\), and \(318\). Therefore, using the adjacency list representation, it can be expressed as \(\text{319: [255, 311, 318]}\). Similarly, by the same logic, the node with index \(24\) has faces that are adjacent to four voxels located to the left, right, front, and top. Therefore, using the adjacency list representation, it can be expressed as \(\text{24: [16, 25, 32, 88]}\).

In this system, the number of neighboring voxel nodes is at least 3 and at most 6, since a voxel is a cuboid-shaped geometry with 6 faces.

Voxel to Graph From the left, Unique Index · Adjacency List Representation — Voxel to Graph
From the left, Unique Index · Adjacency List Representation

Each node in the graph data has feature vectors of the same shape as every other node. As noted above, each voxel is converted into a node, so the node's features are the voxel's geometric attributes. In the raw data, each voxel node has the following features: coordinate \((3)\), dimension \((3)\), and location \((3)\) (voxel_graph_features is a flattened vector \((9)\) containing all three). Additionally, the FAR \((1)\) feature from graph_global_*.json and the floor level \((1)\) of each voxel are combined as part of the node features. The final feature vector of each node has a length of \((11)\); therefore, each graph is represented as an \((\text{N}, 11)\)-shaped matrix, where \(\text{N}\) is the number of nodes.

The details of processing the local graph are omitted here. The code for processing the local graph is available at this link.


    class VoxelGraphData:
        def __init__(self, voxel_graph_data: dict):
            voxel_graph_types_onehot = voxel_graph_data["voxel_graph_types_onehot"]
            voxel_graph_features = voxel_graph_data["voxel_graph_features"]
            voxel_graph_far_per_node = torch.zeros(voxel_graph_types_onehot.shape[0], 1) + voxel_graph_data["far"]
            voxel_graph_floor_levels_normalized = voxel_graph_data["voxel_graph_floor_levels_normalized"].unsqueeze(0).t()

            self.x = torch.cat(
                [
                    voxel_graph_features,
                    voxel_graph_far_per_node,
                    voxel_graph_floor_levels_normalized,
                ],
                dim=1,
            )

            self.data_number = voxel_graph_data["data_number"]
            self.voxel_graph_types = voxel_graph_data["voxel_graph_types"]
            self.voxel_graph_types_onehot = voxel_graph_types_onehot
            self.edge_index = voxel_graph_data["voxel_graph_edge_indices"]
            self.voxel_graph_floor_levels = voxel_graph_data["voxel_graph_floor_levels"]
            self.voxel_graph_node_coordinate = voxel_graph_data["voxel_graph_node_coordinate"]
            self.voxel_graph_node_dimension = voxel_graph_data["voxel_graph_node_dimension"]
            self.voxel_graph_location = voxel_graph_data["voxel_graph_location"]
            self.voxel_graph_node_ratio = voxel_graph_types_onehot * voxel_graph_data["voxel_graph_node_ratio"]
            self.voxel_graph_node_ratio = self.voxel_graph_node_ratio.max(dim=1)[0].unsqueeze(1)

VoxelGraphData class

Here, since PyTorch Geometric is used for training models with graphs, node features and the connectivity between nodes are represented as x and edge_index, respectively. Each graph is instantiated by Data of PyTorch Geometric. The raw data, excluding x and edge_index, can also be passed as kwargs to be used during training or evaluation. It should be noted that, in training with graphs, mini-batching is performed at the node level. Each data field must therefore match the number of nodes; otherwise, problems may occur during batching.


    class GraphDataset(Dataset):
        def __init__(self, configuration: Configuration):
            super().__init__()

            self.configuration = configuration
            
            (...)

            self.voxel_graph_data = []
            for local_graph_file, voxel_graph_file in zip(self.local_graph_data_files, self.voxel_graph_data_files):
                assert local_graph_file.split("/")[-1].split("_")[0] == voxel_graph_file.split("/")[-1].split("_")[0]
                
                (...)
    
                voxel_graph = torch.load(voxel_graph_file)
                self.voxel_graph_data.append(
                    Data(
                        x=voxel_graph.x,
                        edge_index=voxel_graph.edge_index,
                        voxel_level=voxel_graph.voxel_graph_floor_levels,
                        type=voxel_graph.voxel_graph_types,
                        types_onehot=voxel_graph.voxel_graph_types_onehot,
                        coordinate=voxel_graph.voxel_graph_node_coordinate,
                        dimension=voxel_graph.voxel_graph_node_dimension,
                        location=voxel_graph.voxel_graph_location,
                        node_ratio=voxel_graph.voxel_graph_node_ratio,
                        data_number=[voxel_graph.data_number] * voxel_graph.x.shape[0],
                    )
                )
    
        def __getitem__(self, i):
            return self.local_graph_data[i], self.voxel_graph_data[i]
    
        def __len__(self):
            return len(self.local_graph_data)

GraphDataset class

The detailed logic for converting voxels into a graph at the code level can be found through the following link.

The GNN model can be broadly divided into link prediction and node classification tasks. In the paper, the model generates a building consisting of voxels with labels given a local graph and a voxel graph, so the task corresponds to node prediction. The diagram below briefly illustrates the training process. The process follows a WGAN-GP architecture, in which both the generator and the discriminator are based on Graph Neural Networks.

VoxelGNNGenerator and VoxelGNNDiscriminator both use the local graph and the voxel graph as inputs. Additionally, the generator takes noise z as an input, while the discriminator takes the labels generated by VoxelGNNGenerator as an additional input. The discriminator learns to determine whether the node labels are real or fake based on the two input graphs and z. The generator is then updated using both the discriminator's feedback and auxiliary losses.

The detailed code of VoxelGNNGenerator is as follows. The very first step in the generator's forward pass is to compute matched_x, which is matched by combining features from the local graph and the voxel graph. The local graph is a bubble diagram that abstractly represents the building's spatial composition, so it and the voxel graph have different numbers of nodes and cannot be matched 1:1. Therefore, their features are combined by aggregating nodes of the same type, using averaging. In this way, the features of the local graph nodes are abstractly combined for each type of voxel node.

Then, the abstractly combined graph features are passed through the matched_features_encoder module for encoding. Next, these encoded matched features are concatenated with the original voxel graph features and the noise vector z to create combined_features. This concatenation serves multiple purposes: the voxel graph features preserve the geometric details, and the noise vector z enables the model to produce diverse outputs even with the same conditions.

The combined_features are then passed through the mlp_encoder to produce encoded graph features, which serve as the input to the encoder of VoxelGNNGenerator. The encoder performs message passing using the voxel graph's edge connections to generate node embeddings.


    class VoxelGNNGenerator(nn.Module):
        def __init__(self, configuration: Configuration):
            super().__init__()

            (...)

        def forward(self, local_graph, voxel_graph, z):
            device = local_graph.x.device
    
            matched_x = torch.zeros((voxel_graph.x.shape[0], local_graph.x.shape[1]), device=device)
    
            for type_idx in torch.unique(voxel_graph.type):
                voxel_mask = voxel_graph.type == type_idx
                local_mask = local_graph.type == type_idx
    
                if local_mask.sum() > 0:
                    matched_x[voxel_mask] = local_graph.x[local_mask].mean(dim=0)
    
            encoded_matched_features = self.matched_features_encoder(
                matched_x.to(self.matched_features_encoder[0].weight.device)
            )
    
            combined_features = torch.cat(
                [
                    encoded_matched_features, 
                    voxel_graph.x.to(encoded_matched_features.device), 
                    z.squeeze(0).to(encoded_matched_features.device)
                ], dim=-1
            )
    
            x = self.mlp_encoder(combined_features)
            encoded = self.encoder(x=x, edge_index=voxel_graph.edge_index)
    
            final_features = torch.cat([encoded, x, encoded_matched_features, voxel_graph.x, z.squeeze(0)], dim=-1)
    
            logits = self.decoder(final_features)

            # https://github.com/AutodeskAILab/Building-GAN/blob/master/util.py
            label_soft = torch.nn.functional.gumbel_softmax(logits, tau=1.0)
            label_hard = torch.zeros_like(label_soft)
            label_hard.scatter_(-1, label_soft.argmax(dim=1, keepdim=True), 1.0)
            label_hard = label_hard - label_soft.detach() + label_soft
    
            return logits, label_hard, label_soft

VoxelGNNGenerator

Finally, the model creates final_features by concatenating the encoded output with all previous feature representations: the encoder output, the initial combined features, the encoded matched features, the original voxel features, and the noise vector. This rich feature representation is passed through a decoder to produce logits for each voxel's class prediction.

In this project, auxiliary losses are used in addition to WGAN-GP, namely the FAR of the building generated by the model and the ratio of voxel types. For this purpose, the logits need to be converted to hard labels, but using simple argmax would break the gradient, so Gumbel Softmax is used. The temperature parameter \(\tau\) controls how discrete the output is: when \(\tau\) is close to \(0\), the output becomes like a one-hot vector (hard), when \(\tau = 1.0\) the output is soft, and when \(\tau\) approaches \(\infty\), the output becomes almost uniform. \[ \,\\ y_i = \frac{\exp( ( \log(\pi_i) + g_i ) / \tau)}{\sum_{j=1}^{\tiny{K}} \exp( ( \log(\pi_j) + g_j ) / \tau)} \,\\ \] The discriminator's loss is defined as d_fake.mean() - d_real.mean(), which minimizes the distance between the real data distribution and the generated data distribution (maximizing the output difference of a function satisfying the 1-Lipschitz condition). The discriminator is trained to assign low values to fake data and high values to real data.

To minimize this distance, the generator must produce buildings with a distribution that resembles the real data. In other words, the generator aims to increase the discriminator's score for fake data, using adversarial loss as the main objective, with the above-mentioned auxiliary losses as additional objectives: \[ \,\\ \begin{aligned} \text{Loss}_{\text{G}} \, = &\quad\, \text{loss}_{\text{adv}} \cdot \lambda_{\text{adv}} \\ & + \text{loss}_{\text{ratio}} \cdot \lambda_{\text{ratio}} \\ & + \text{loss}_{\text{far}} \cdot \lambda_{\text{far}} \end{aligned} \,\\ \]


    def _compute_generator_loss(self, local_graph, voxel_graph, logits, label_hard):

        d_fake = self.discriminator(local_graph, voxel_graph, label_hard)
        if self.configuration.USE_WGANGP:
            g_loss_adv = -d_fake.mean()
        
        else:
            g_loss_adv = torch.nn.functional.binary_cross_entropy(d_fake, torch.ones_like(d_fake))
            
        g_loss_adv *= self.configuration.LAMBDA_ADV
        
        label_ratio_g = label_hard.squeeze(0).sum(dim=0) / voxel_graph.num_nodes
        label_ratio = voxel_graph.types_onehot.sum(dim=0) / voxel_graph.num_nodes

        g_loss_ratio = torch.nn.functional.mse_loss(label_ratio_g[:-2], label_ratio[:-2])
        g_loss_ratio *= self.configuration.LAMBDA_RATIO

        g_loss_ratio_voids = torch.nn.functional.mse_loss(label_ratio_g[-2:], label_ratio[-2:])
        g_loss_ratio_voids *= self.configuration.LAMBDA_RATIO_VOID
        
        voxel_types_generated = label_hard.squeeze(0).argmax(dim=1)
            
        far_unique = []
        far_unique_generated = []

        si = 0
        for gi in range(voxel_graph.num_graphs):
            each_voxel_graph = voxel_graph[gi]
            each_far = each_voxel_graph.x[0][9]
            each_dimension = each_voxel_graph.x[:, 3:6] * self.configuration.NORMALIZATION_FACTOR_DIMENSION
            
            ei = si + each_voxel_graph.num_nodes
            each_voxel_types_generated = voxel_types_generated[si:ei]
            each_dimension_to_use = each_dimension[each_voxel_types_generated != self.configuration.VOID]
            
            each_gfa = (each_dimension_to_use[:, 1] * each_dimension_to_use[:, 2]).sum()
            each_far_generated = each_gfa / each_voxel_graph.site_area[0]
            
            far_unique.append(each_far)
            far_unique_generated.append(each_far_generated)
            
            si = ei
            
        g_loss_far = torch.nn.functional.mse_loss(torch.tensor(far_unique_generated), torch.tensor(far_unique))
        g_loss_far *= self.configuration.LAMBDA_FAR
        
        g_loss = g_loss_adv + g_loss_ratio + g_loss_ratio_voids + g_loss_far
        
        return g_loss

\( \text{Loss}_{\text{G}} \) computation

The lambdas for the generator are carefully balanced to ensure stable training. The adversarial loss takes priority with \(\lambda_{\text{adv}} = 1.0\), while the auxiliary losses for program type ratios and FAR are weighted at \(\lambda_{\text{ratio}} = 0.1\) and \(\lambda_{\text{far}} = 0.1\) respectively. This configuration allows the model to focus primarily on generating buildings' program distribution while maintaining architectural constraints.

For this implementation, a subset of 10,000 graphs from the total 120,000 dataset is used. The monitoring results from training on 10,000 graphs for 1,000 epochs are as follows. Each metric commonly used for evaluating classification models was measured, and all graphs show convergence.

Losses & Metrics From the left, Generator Loss · Accuracy · Recall · Precision · F1 — Losses & Metrics
From the left, Generator Loss · Accuracy · Recall · Precision · F1

However, since the total number of nodes within the 10,000 samples is approximately 4,000,000, the large number of nodes means that, even when misclassified labels are present, the metrics cannot reflect the performance differences. In addition, since VoxelGNNGenerator performs node-wise label prediction, it is difficult to evaluate the generation quality at the graph level.

Graph-wise f1 scores From the left, Minimum f1 score of training dataset Minimum f1 score of validation dataset Weighted sum… — Graph-wise f1 scores
From the left, Minimum f1 score of training dataset
Minimum f1 score of validation dataset
Weighted sum of f1 scores

To address this problem, the validation method computes the graph-wise f1 scores on both training and validation datasets. By calculating f1 scores for node labels separately for each graph, the quality of each building can be evaluated, even though generating these metrics takes more time. Therefore, the model selection criterion \(\mathcal{C}\) uses a weighted sum of the worst-performing graphs from both sets. The model is saved when this weighted sum of f1 scores improves, computed as follows, where \(w_{\text{t}} = 0.05\) and \(w_{\text{v}} = 1.0\). \[ \,\\ \mathcal{C} = \min_{g \,\in\, G_{\text{t}}} \text{f1}(g) \cdot w_{\text{t}} + \min_{g \,\in\, G_{\text{v}}} \text{f1}(g) \cdot w_{\text{v}} \,\\ \]


    current_f1_score = (
        f1_score_min_train * self.configuration.F1_SCORE_TRAIN_WEIGHT
        + f1_score_min_validation * self.configuration.F1_SCORE_VALIDATION_WEIGHT
    )
    
    if best_f1_score < current_f1_score:
        best_f1_score = current_f1_score

        torch.save(
            {
                "epoch_start": epoch,
                "epoch_end": epoch_end,
                "best_f1_score": best_f1_score,
                "f1_score_train": f1_score_train,
                "f1_score_validation": f1_score_validation,
                "f1_score_min_train": f1_score_min_train,
                "f1_score_min_validation": f1_score_min_validation,
                "recall_score_train": recall_score_train,
                "recall_score_validation": recall_score_validation,
                "accuracy_score_train": accuracy_score_train,
                "accuracy_score_validation": accuracy_score_validation,
                "generator": self.generator.state_dict(),
                "discriminator": self.discriminator.state_dict(),
                "optimizer_generator": self.optimizer_generator.state_dict(),
                "optimizer_discriminator": self.optimizer_discriminator.state_dict(),
                "scheduler_generator": self.scheduler_generator.state_dict(),
            },
            os.path.join(self.log_dir, "states.pt"),
        )

Condition to save model status

Graph-based models operate with graphs as batch, but since computations are performed at the node level, even when the batch size remains constant, the number of nodes varies with each input. Therefore, when using BatchNorm, statistical instability can occur in graphs with fewer nodes. Particularly when using running mean and variance in eval() mode, performance may drop significantly. This is because graphs with fewer nodes tend to have larger differences from the overall training distribution. As a solution to this issue, LayerNorm can be used. It performs normalization independently for each node across the feature dimension, making it more suitable for graphs with varying numbers of nodes and providing more stable performance during evaluation.

BatchNorm vs. LayerNorm in validation From the left, BatchNorm · LayerNorm — BatchNorm vs. LayerNorm in validation
From the left, BatchNorm · LayerNorm

Finally, passing the test dataset through the trained VoxelGNNGenerator model produces results such as the following.

This project yielded several important findings. The most significant was that layer normalization performs considerably better than batch normalization when working with graphs that have varying numbers of nodes. This insight will be helpful for building better graph neural networks in the future.

The following directions are planned for future work:

Implement Building Generation Interface: Develop an interface, in the form of a web application or a CAD plugin, that allows the two input graphs to be drawn easily and scheme designs for buildings to be generated quickly.
Apply Sun Light Regulation: Apply building regulations, particularly the fundamental daylight regulations, so that the generated design proposals comply with these requirements.
Evaluate with metrics for generative model: Implement more reliable methods for measuring generative-model quality, such as the FID score, the Inception score, and other established metrics.

Introduction 🔗

Raw Data

Representing Voxels as a Graph

VoxelGNNs

Training VoxelGNNs

Building Generation

Future Works

References