Voxels to Graph
Introduction ๐
This project
explores the key contributions of Building-GAN by AutodeskAILab:
1) a novel 3D representation called voxel graph that can encode irregular voxel grids with non-uniform space partitioning, overcoming limitations of traditional regular voxel grids, and
2) a graph-conditioned generative adversarial network (GAN) leveraging graph neural networks (GNNs).
The implementation uses their
dataset to study these approaches.
First, let's examine the structure of the raw data in the next section.
Generated Buildings
Raw Data
In the Data Collection part of the paper, it says that the dataset was created as follows:
Since there is no publicly available dataset for volumetric designs from real buildings, we create a synthetic
dataset with 120,000 volumetric designs for commercial buildings using parametric models.
The heuristics behind the parametric models are based on the rules and knowledge provided by professional architects.
Although these parametric models are able to explore possible volumetric designs, they are not capable of fitting the constraints.
Therefore, we generate the designs first and then compute the voxel graph, program graph, FAR, and TPR for each design.
(...)
Here, we consider 6 program types: lobby/corridor, restroom, stairs, elevator, office, and mechanical room.
Each program node feature includes the program type and the story level.
Each datum of the created dataset consists of three JSONs as follows:
-
graph_global_*.json:
contains the FAR (Floor Area Ratio), site area (Total land area to build a building), program ratios, and program types (6 program types described above + void type).
{
"far": 3.1144640998959425,
"site_area": 961,
"global_node": [
{
"type": 3,
"proportion": 0.04811226194453724,
(...)
},
{
"type": 0,
"proportion": 0.25526227865018364,
(...)
},
(...)
]
}
graph_global_*.json
-
graph_local_*.json:
contains the nodes.
Each node has the following attributes:
floor \(\text{F}\) (the floor level on which the node is placed),
program type \(\text{T}\), index \(\text{I}\), and neighbors that indicate connectivity between nodes.
In the local graph, the unique index of each node is created as \(\text{[F, T, I]}\),
and the neighbors contain these unique indices.
In the example below, the first node's unique index is \(\text{[0, 3, 0]}\),
and it has \(\text{[1, 3, 0]}\) as a neighbor.
Similarly, the node with index \(\text{[1, 3, 0]}\) has \(\text{[0, 3, 0]}\) as a neighbor.
{
"node": [
{
"floor": 0,
"type": 3,
"type_id": 0,
"neighbors": [[0, 0, 0], [1, 3, 0]]
(...)
},
{
"floor": 1,
"type": 3,
"type_id": 0,
"neighbors": [[1, 0, 0], [2, 3, 0], [0, 3, 0]]
(...)
},
(...)
]
}
graph_local_*.json
-
voxel_*.json:
contains the voxel voxel nodes. Each voxel node has the following attributes:
location (it serves as a unique index in the voxel graph),
program type,
dimension (depth, height, width of an irregular voxel),
coordinate (center of the voxel),
and neighbors that indicate connectivity between nodes.
{
"voxel_node": [
{
"location": [0, 2, 7],
"type": 3,
"coordinate": [0.0, 20.0, 37.0],
"dimension": [7.0, 3.0, 3.0],
"neighbors": [[1, 2, 7], [0, 1, 7], [0, 3, 7], [0, 2, 6]]
(...)
},
{
"location": [1, 2, 7],
"type": 3,
"coordinate": [7.0, 20.0, 37.0],
"dimension": [4.0, 3.0, 3.0],
"neighbors": [[0, 2, 7], [2, 2, 7], [1, 1, 7], [1, 3, 7], [1, 2, 6]]
(...)
},
(...)
]
}
voxel_*.json
Representing Voxels as a Graph
A Graph data structure consists of vertices and edges \(G(V, E)\).
The vertices are sometimes also referred to as nodes
and the edges are lines or arcs that connect any two nodes in the graph.
The graph data structure can be represented as an adjacency list (or matrix) to indicate connectivity between nodes.
To convert voxel data into a graph structure, we need to assign an index for each voxel.
The index of each voxel is \(\text{(x, y, z)} \)-shaped structure, and it is unique, so that the index can be converted to \(\text{(0, 1, 2,} \cdot \cdot \cdot \text{, n)} \)-shaped structure.
As each voxel is a node, the last index of the node is the number of voxels-1.
Since the voxel is a cuboid-shaped geometry, connectivity between voxel nodes can be determined by the 3D 6-way connectivity.
In the figure below, let's see the uppermost voxel with index \(319\) of the right diagram.
Among the six faces of this voxel, only three (right, front, and bottom) are connected to adjacent voxels.
The other faces do not touch any neighboring voxels. According to the 3D 6-way connectivity system, the node with index \(319\) has neighbor nodes with indices \(255\), \(311\), and \(318\).
Therefore, using the adjacency list representation, it can be expressed as \(\text{319: [255, 311, 318]}\).
Similarly, by the same logic, the node with index \(24\) has faces that are adjacent to four voxels located to the left, right, front, and top.
Therefore, using the adjacency list representation, it can be expressed as \(\text{24: [16, 25, 32, 88]}\).
In this system,
the number of neighboring voxel nodes is at least 3 and at most 6, since a voxel is a cuboid-shaped geometry with 6 faces.
Voxel to Graph
From the left, Unique Index ยท Adjacency List Representation
Each node in the graph data has the same shape features as other nodes.
As I mentioned above, each voxel is converted into a node, so the node's features are the voxel's geometric attributes.
In the raw data, each voxel node has the following features: coordinate \((3)\), dimension \((3)\), and location \((3)\)
(
voxel_graph_features is a flattened vector \((9)\) containing all three).
Additionally, the FAR \((1)\) feature from
graph_global_*.json and the floor level \((1)\) of each voxel are combined as part of the node features.
The final feature vector of each node has a length of \((11)\); therefore, each graph is represented as an \((\text{N}, 11)\)-shaped matrix, where \(\text{N}\) is the number of nodes.
Here, I will skip the details of processing the local graph.
You can find the code for processing the local graph at this
link.
class VoxelGraphData:
def __init__(self, voxel_graph_data: dict):
voxel_graph_types_onehot = voxel_graph_data["voxel_graph_types_onehot"]
voxel_graph_features = voxel_graph_data["voxel_graph_features"]
voxel_graph_far_per_node = torch.zeros(voxel_graph_types_onehot.shape[0], 1) + voxel_graph_data["far"]
voxel_graph_floor_levels_normalized = voxel_graph_data["voxel_graph_floor_levels_normalized"].unsqueeze(0).t()
self.x = torch.cat(
[
voxel_graph_features,
voxel_graph_far_per_node,
voxel_graph_floor_levels_normalized,
],
dim=1,
)
self.data_number = voxel_graph_data["data_number"]
self.voxel_graph_types = voxel_graph_data["voxel_graph_types"]
self.voxel_graph_types_onehot = voxel_graph_types_onehot
self.edge_index = voxel_graph_data["voxel_graph_edge_indices"]
self.voxel_graph_floor_levels = voxel_graph_data["voxel_graph_floor_levels"]
self.voxel_graph_node_coordinate = voxel_graph_data["voxel_graph_node_coordinate"]
self.voxel_graph_node_dimension = voxel_graph_data["voxel_graph_node_dimension"]
self.voxel_graph_location = voxel_graph_data["voxel_graph_location"]
self.voxel_graph_node_ratio = voxel_graph_types_onehot * voxel_graph_data["voxel_graph_node_ratio"]
self.voxel_graph_node_ratio = self.voxel_graph_node_ratio.max(dim=1)[0].unsqueeze(1)
VoxelGraphData class
Here, since PyTorch Geometric is used for training models with graphs,
node features and the connectivity between nodes are represented as
x and
edge_index, respectively.
Each graph is instantiated by
Data of PyTorch Geometric.
The raw data, excluding
x and
edge_index, can also be passed as kwargs to be used during training or evaluation.
The thing to note is that, in training with graphs, mini-batching is performed at the node level.
Therefore, you must ensure that each data field matches the number of nodes; otherwise, problems may occur during batching.
class GraphDataset(Dataset):
def __init__(self, configuration: Configuration):
super().__init__()
self.configuration = configuration
(...)
self.voxel_graph_data = []
for local_graph_file, voxel_graph_file in zip(self.local_graph_data_files, self.voxel_graph_data_files):
assert local_graph_file.split("/")[-1].split("_")[0] == voxel_graph_file.split("/")[-1].split("_")[0]
(...)
voxel_graph = torch.load(voxel_graph_file)
self.voxel_graph_data.append(
Data(
x=voxel_graph.x,
edge_index=voxel_graph.edge_index,
voxel_level=voxel_graph.voxel_graph_floor_levels,
type=voxel_graph.voxel_graph_types,
types_onehot=voxel_graph.voxel_graph_types_onehot,
coordinate=voxel_graph.voxel_graph_node_coordinate,
dimension=voxel_graph.voxel_graph_node_dimension,
location=voxel_graph.voxel_graph_location,
node_ratio=voxel_graph.voxel_graph_node_ratio,
data_number=[voxel_graph.data_number] * voxel_graph.x.shape[0],
)
)
def __getitem__(self, i):
return self.local_graph_data[i], self.voxel_graph_data[i]
def __len__(self):
return len(self.local_graph_data)
GraphDataset class
If you are curious about how to convert voxels into a graph at the code level,
you can check the detailed logic through the following
link.
VoxelGNNs
The GNN model can be broadly divided into link prediction and node classification tasks.
In the paper, the model generates a building consisting of voxels with labels given a local graph and a voxel graph, so the task corresponds to
node prediction.
The diagram below provides a brief illustration of the training process. The process follows a
WGAN-GP architecture, in which both the generator and discriminator are based on Graph Neural Networks.
VoxelGNNGenerator and VoxelGNNDiscriminator both use the local graph and the voxel graph as inputs.
Additionally, the generator takes noise
z as an input, while the discriminator takes the labels generated by VoxelGNNGenerator as an additional input.
The discriminator learns to determine whether the node labels are real or fake based on the two input graphs and
z.
The generator is then updated using both the discriminator's feedback and auxiliary losses.
Training Process
The detailed code of VoxelGNNGenerator is as follows.
The very first step in the generator's forward pass is to compute
matched_x,
which is matched by combining features from the local graph and the voxel graph.
The local graph is a bubble diagram that abstractly represents the building's spatial composition,
so it and the voxel graph have different numbers of nodes and cannot be matched 1:1.
Therefore, their features are combined by aggregating nodes of the same type, using averaging.
In this way,
the features of the local graph nodes are abstractly combined for each type of voxel node.
Then, the abstractly combined graph features are passed through the
matched_features_encoder module for encoding.
Next, these encoded matched features are concatenated with the original voxel graph features and the noise vector
z to create
combined_features.
This concatenation serves multiple purposes:
the voxel graph features
preserve the geometric details,
and the noise vector
z enables the model to
produce diverse outputs even with the same conditions.
The
combined_features are then passed through the
mlp_encoder to produce encoded graph features, which serve as the input to the
encoder of VoxelGNNGenerator.
The encoder performs message passing using the voxel graph's edge connections to generate node embeddings.
class VoxelGNNGenerator(nn.Module):
def __init__(self, configuration: Configuration):
super().__init__()
(...)
def forward(self, local_graph, voxel_graph, z):
device = local_graph.x.device
matched_x = torch.zeros((voxel_graph.x.shape[0], local_graph.x.shape[1]), device=device)
for type_idx in torch.unique(voxel_graph.type):
voxel_mask = voxel_graph.type == type_idx
local_mask = local_graph.type == type_idx
if local_mask.sum() > 0:
matched_x[voxel_mask] = local_graph.x[local_mask].mean(dim=0)
encoded_matched_features = self.matched_features_encoder(
matched_x.to(self.matched_features_encoder[0].weight.device)
)
combined_features = torch.cat(
[
encoded_matched_features,
voxel_graph.x.to(encoded_matched_features.device),
z.squeeze(0).to(encoded_matched_features.device)
], dim=-1
)
x = self.mlp_encoder(combined_features)
encoded = self.encoder(x=x, edge_index=voxel_graph.edge_index)
final_features = torch.cat([encoded, x, encoded_matched_features, voxel_graph.x, z.squeeze(0)], dim=-1)
logits = self.decoder(final_features)
# https://github.com/AutodeskAILab/Building-GAN/blob/master/util.py
label_soft = torch.nn.functional.gumbel_softmax(logits, tau=1.0)
label_hard = torch.zeros_like(label_soft)
label_hard.scatter_(-1, label_soft.argmax(dim=1, keepdim=True), 1.0)
label_hard = label_hard - label_soft.detach() + label_soft
return logits, label_hard, label_soft
VoxelGNNGenerator
Finally, the model creates
final_features by concatenating the encoded output with all previous feature representations:
the encoder output, the initial combined features, the encoded matched features, the original voxel features, and the noise vector.
This rich feature representation is passed through a decoder to produce
logits for each voxel's class prediction.
In this project, auxiliary losses are used in addition to WGAN-GP: calculating the FAR of the building generated by the model and the ratio of voxel types.
For this purpose, the
logits need to be converted to hard labels, but using simple argmax would break the gradient, so
Gumbel Softmax is used.
The temperature parameter \(\tau\) controls how discrete the output is:
when \(\tau\) is close to \(0\), the output becomes like a one-hot vector (hard),
when \(\tau = 1.0\) the output is soft, and when \(\tau\) approaches \(\infty\), the output becomes almost uniform.
\[
\,\\
y_i = \frac{\exp( ( \log(\pi_i) + g_i ) / \tau)}{\sum_{j=1}^{\tiny{K}} \exp( ( \log(\pi_j) + g_j ) / \tau)}
\,\\
\]
The discriminator's
loss
is defined as
d_fake.mean() - d_real.mean(),
which minimizes the distance between the real data distribution and the generated data distribution
(maximizing the output difference of a function satisfying the 1-Lipschitz condition).
The discriminator is trained to assign low values to fake data and high values to real data.
To minimize this distance, the generator must produce buildings with a distribution that resembles the real data.
In other words,
the generator aims to increase the discriminator's score for fake data, using adversarial
loss as the main objective,
with the above-mentioned auxiliary losses as additional objectives:
\[
\,\\
\begin{aligned}
\text{Loss}_{\text{G}} \, = &\quad\, \text{loss}_{\text{adv}} \cdot \lambda_{\text{adv}} \\
& + \text{loss}_{\text{ratio}} \cdot \lambda_{\text{ratio}} \\
& + \text{loss}_{\text{far}} \cdot \lambda_{\text{far}}
\end{aligned}
\,\\
\]
def _compute_generator_loss(self, local_graph, voxel_graph, logits, label_hard):
d_fake = self.discriminator(local_graph, voxel_graph, label_hard)
if self.configuration.USE_WGANGP:
g_loss_adv = -d_fake.mean()
else:
g_loss_adv = torch.nn.functional.binary_cross_entropy(d_fake, torch.ones_like(d_fake))
g_loss_adv *= self.configuration.LAMBDA_ADV
label_ratio_g = label_hard.squeeze(0).sum(dim=0) / voxel_graph.num_nodes
label_ratio = voxel_graph.types_onehot.sum(dim=0) / voxel_graph.num_nodes
g_loss_ratio = torch.nn.functional.mse_loss(label_ratio_g[:-2], label_ratio[:-2])
g_loss_ratio *= self.configuration.LAMBDA_RATIO
g_loss_ratio_voids = torch.nn.functional.mse_loss(label_ratio_g[-2:], label_ratio[-2:])
g_loss_ratio_voids *= self.configuration.LAMBDA_RATIO_VOID
voxel_types_generated = label_hard.squeeze(0).argmax(dim=1)
far_unique = []
far_unique_generated = []
si = 0
for gi in range(voxel_graph.num_graphs):
each_voxel_graph = voxel_graph[gi]
each_far = each_voxel_graph.x[0][9]
each_dimension = each_voxel_graph.x[:, 3:6] * self.configuration.NORMALIZATION_FACTOR_DIMENSION
ei = si + each_voxel_graph.num_nodes
each_voxel_types_generated = voxel_types_generated[si:ei]
each_dimension_to_use = each_dimension[each_voxel_types_generated != self.configuration.VOID]
each_gfa = (each_dimension_to_use[:, 1] * each_dimension_to_use[:, 2]).sum()
each_far_generated = each_gfa / each_voxel_graph.site_area[0]
far_unique.append(each_far)
far_unique_generated.append(each_far_generated)
si = ei
g_loss_far = torch.nn.functional.mse_loss(torch.tensor(far_unique_generated), torch.tensor(far_unique))
g_loss_far *= self.configuration.LAMBDA_FAR
g_loss = g_loss_adv + g_loss_ratio + g_loss_ratio_voids + g_loss_far
return g_loss
\( \text{Loss}_{\text{G}} \) computation
The lambdas for the generator are carefully balanced to ensure stable training.
The adversarial loss takes priority with \(\lambda_{\text{adv}} = 1.0\),
while the auxiliary losses for program type ratios and FAR are weighted at \(\lambda_{\text{ratio}} = 0.1\) and \(\lambda_{\text{far}} = 0.1\) respectively.
This configuration allows the model to focus primarily on generating buildings' program distribution while maintaining architectural constraints.
Training VoxelGNNs
For this implementation, a subset of 10,000 graphs from the total 120,000 dataset is used.
The monitoring results from training on 10,000 graph data for 1,000 epochs are as follows.
Each metric commonly used for evaluating classification models was measured. All graphs show convergence.
Losses & Metrics
From the left, Generator Loss ยท Accuracy ยท Recall ยท Precision ยท F1
However, since the total number of nodes within the 10,000 data samples is approximately 4,000,000 nodes, due to the large number of nodes,
even if misclassified labels are mixed in, the metrics cannot reflect the performance differences.
Also, since VoxelGNNGenerator performs node-wise label prediction,
there is a problem in that
it is difficult to evaluate the generation quality at the graph level.
Graph-wise f1 scores
From the left,
Minimum f1 score of training dataset
Minimum f1 score of validation dataset
Weighted sum of f1 scores
To address this problem, the validation method
computes the graph-wise f1 scores on both training and validation datasets.
By calculating f1 scores for node labels separately for each graph, we can evaluate the quality of each building,
even though generating these metrics takes more time.
Therefore, the model selection criterion \(\mathcal{C}\) uses a weighted sum of the worst-performing graphs from both sets.
The model is saved when the weighted sum of f1 score improves, calculated as, where \(w_{\text{t}} = 0.05\) and \(w_{\text{v}} = 1.0\).
\[
\,\\
\mathcal{C} =
\min_{g \,\in\, G_{\text{t}}} \text{f1}(g) \cdot w_{\text{t}}
+ \min_{g \,\in\, G_{\text{v}}} \text{f1}(g) \cdot w_{\text{v}}
\,\\
\]
current_f1_score = (
f1_score_min_train * self.configuration.F1_SCORE_TRAIN_WEIGHT
+ f1_score_min_validation * self.configuration.F1_SCORE_VALIDATION_WEIGHT
)
if best_f1_score < current_f1_score:
best_f1_score = current_f1_score
torch.save(
{
"epoch_start": epoch,
"epoch_end": epoch_end,
"best_f1_score": best_f1_score,
"f1_score_train": f1_score_train,
"f1_score_validation": f1_score_validation,
"f1_score_min_train": f1_score_min_train,
"f1_score_min_validation": f1_score_min_validation,
"recall_score_train": recall_score_train,
"recall_score_validation": recall_score_validation,
"accuracy_score_train": accuracy_score_train,
"accuracy_score_validation": accuracy_score_validation,
"generator": self.generator.state_dict(),
"discriminator": self.discriminator.state_dict(),
"optimizer_generator": self.optimizer_generator.state_dict(),
"optimizer_discriminator": self.optimizer_discriminator.state_dict(),
"scheduler_generator": self.scheduler_generator.state_dict(),
},
os.path.join(self.log_dir, "states.pt"),
)
Condition to save model status
Graph-based models operate with
graphs as batch, but since computations are performed at the node level,
even when the batch size remains constant, the number of nodes varies with each input.
Therefore, when using
BatchNorm, statistical instability can occur in graphs with fewer nodes.
Particularly when using running mean and variance in
eval() mode, performance may drop significantly.
This is because graphs with fewer nodes tend to have larger differences from the overall training distribution.
As a solution to this issue,
LayerNorm can be used. It performs normalization independently for each node across the feature dimension,
making it more suitable for graphs with varying numbers of nodes and providing more stable performance during evaluation.
BatchNorm vs. LayerNorm in validation
From the left, BatchNorm ยท LayerNorm
Building Generation
Finally, by passing the test dataset through the trained VoxelGNNGenerator model, we can obtain results like the following.
Generated Buildings w/ Test Dataset
Future Works
This project revealed several important findings to me.
The most significant discovery was that layer normalization performs significantly better than batch normalization when working with graphs that have varying numbers of nodes.
This insight will be helpful for building better graph neural networks in the future.
In the future, I will try the following things:
-
Implement Building Generation Interface: Develop an interface in web or CAD plugin format that allows for easy drawing of two input graphs and quickly generating scheme designs for buildings
-
Apply Sun Light Regulation: Apply building regulations, particularly the fundamental daylight regulations, to generate design proposals that comply with these requirements
-
Evaluate with metrics for generative model: Implement more reliable methods for measuring generative model quality, such as FID score, Inception score, and other established metrics
References