latentspace · building-gan

Building-GAN: Graph-Conditioned Architectural Volumetric Design Generation

01/13/2025

Building-GAN: Graph-Conditioned Architectural Volumetric Design Generation
1. Introduction
  - We propose voxel graph, a novel 3D representation that can encode irregular voxel grids with non-uniform space partitioning. To bridge between the input program graph and the output voxel graph, we design a pointer-based cross-modal modules in our generative adversarial graph network. The pointer module can be used not only for message passing, but also as a decoder to output probability over a dynamic set of valid programs.
  - In summary, our main contributions are: 1) a new 3D representation, voxel graph; 2) a graph-conditioned generative adversarial network (GAN) using GNN and pointer-based cross-modal module; 3) an automated pipeline to generate valid volumetric designs through simple interaction; 4) a synthetic dataset that contains 120,000 volumetric design and their corresponding program graphs.
2. Related Work
  1. Voxel Representations
    - Regular grid representation using voxels, such as occupancy grids, has been studied since the 3D extension of 2D convolution. However, due to the dense representation for sparse occupancy, voxel representation is notorious for its cubic computational cost and poor scalability to higher resolutions and larger sizes.
    - Our proposed voxel graph combines voxel-based and graph-based representations by encoding voxels into graph nodes.
3. Representation and Data Collection
  - Target program ratio (TPR) defines theapproximate ratio between programs. For example, office :corridor : restroom : elevator : stairs = 50 : 20 : 15 : 5 : 10. Both TPR and FAR are encoded into the program graph.
  - Another input is a valid design space, which may be ir-regular due to building codes. The design space can befurther partitioned freely based on architect's decisions orstatistical heuristics. Inspired by this partitioning process, we invent the representation, voxel graph.
  1. Data Collection
    - Since there is no publicly available dataset for volu-metric designs from real buildings, we create a synthetic dataset with 120,000 volumetric designs for commercial buildings using parametric models. The heuristics behindthe parametric models are based on the rules and knowl-edge provided by professional architects.
  2. Hierarchial Program Graph
    - Given a building datum, we first construct 2D program graphs for each story. Each program node feature includes the program type and the story level. A program edge shows the two programs are connected by a door or opening. To construct the 3D program graph, we stack all 2D program graphs and chain the stairs and elevators, since they are the only paths for moving vertically.
  3. Voxel Graph
    - Each node represents a voxel and the voxel information (coordinate and dimension) is stored as node features. It allows non-uniform space partitioning, which avoids over-discretization when using the uniform voxel size.
    - Theoretically, voxel nodes can encode arbitrary 3D primitives, but in this paper, only cuboids with varying sizes are used to build up the approximated valid design space. When parsing the data, the space partition is defined by the projection of all 2D layouts.
    - In real-world practice, walls tend to align across different stories for structural stability or construction considerations, which leads to a reduced amount of voxels in the space partition
4. Method
  - We formulate the framework as a graph-conditioned GAN. The generator is composed by two GNNs for the program graph and voxel graph, connected by a cross-modal pointer module. The discriminator is composed by a GNN with two decoders to evaluate design from both building and story level.
    
    An overview of Building-GAN. Top: the Program GNN, Voxel GNN, and Cross-Modal Pointer Module for the generator. Bottom: the discriminator with the building and story level decoders.
  1. Generator
    1. Program GNN
      - Denote random program noise as \(z^p\), FAR limit as \(F\), program node feature \(i\) as \(x_i\), neighbor of node \(i\) as \(N e(i)\), node cluster of \(i\)'s program type as \(Cl(i)\), target program ratio of \(i\)'s program type as \(r_{Cl(i)}\), multi-layer perceptron as \(MLP\) , mean pooling as \(Mean\), and concatenation operator as \([·, ·]\).
      - We first map the node feature to the embedding space \((1)\), then compute message passing T times. In each message passing step, we compute the message from neighboring nodes \((2)\) and mean pool all nodes with the same program type as the master node embedding \((3)\). Lastly, we update the node embeddings with residual learning to avoid gradient vanishing \((4)\). After T = 5 steps of message passing, the final embedding of program node \(i\) is denoted as \(x^T_i\).
        \[ \begin{align} \,\\ x^0_i &= MLP^p_{enc}([x_i, z^p_i, F]) \tag{1} \\\\ m^t_i &= \frac{1}{|Ne(i)|} \sum_{j \in Ne(i)} MLP^p_{message}([x^t_i, x^t_j]) \tag{2} \\\\ c^t_i &= Mean_{j \in Cl(i)}(\{x^t_j\}) \tag{3} \\\\ x^{t+1}_i &= x^t_i + MLP^p_{update}([x^t_i, m^t_i, r_{Cl(i)}c^t_i, F]) \tag{4} \,\\ \end{align} \]
    2. Voxel GNN
      - The input voxel features \(v_k\) and voxel noise \(z^v_k\) are first encoded by the voxel GNN encoder. To better encode the story index, we choose positional encoding and add it to the processed embedding \((5)\).
      - Instead of appending the absolute coordinates in voxel features, we use the relative displacements \(p_k - p_l\) in message computation \((6)\). Voxel node embeddings are updated with residual learning \((7)\).
        \[ \begin{align} \,\\ v^0_k &= MLP^v_{enc}([v_k, z^v_k]) + PE(story_k) \tag{5} \\\\ n^t_k &= \sum_{l \in Ne(k)} MLP^v{message} \,\, ([v^t_k, v^t_l, p_k - p_l]) \tag{6} \\\\ v^t_k &= v^t_k + MLP^v_{update}(v^t_k, n^t_k) \tag{7} \,\\ \end{align} \]
    3. Pointer-based Cross-Modal Module
      - After processing the program graph with the program GNN, the final embedding of program nodes can be viewed as the virtual "blueprint" of a design. Therefore, it is necessary to "look" at this blueprint to generate the output.
      - To bridge between the program graph and the voxel graph, we introduce a pointer-based cross-modal module. We cannot use a fixed length output to model program type distribution since 1) different stories can have different numbers of program nodes to choose from, for example, one floor has five rooms and another one has seven rooms; and 2) if there are two program nodes with the same program type, we want to differentiate between the two nodes, such as two restrooms in the same floor.
      - The pointer module returns three terms: \(mask_k, att_k\) and \(v^{t+1}_k \, (8) \). \(mask_k\) is used as a soft prediction whether the voxel node \(k\) is used or not \((9)\). If it is not used, it is left unused and has no program type. Otherwise, \(att_k\) is the attention distribution over the set of program nodes on the same floor \((10, 11) \). An updated embedding \(v^{t+1}_k \) is computed by the weighted sum of the program embeddings \(x^T_i\) multiplied by the soft prediction \(mask_k\) with residual learning \((12) \).
        \[ \begin{align} \,\\ mask_k, att_k, v^{t+1}_k &= Pointer(v^t_k, \{ x^T_i \}) \tag{8} \\\\ mask_k &= \sigma(MLP(v^t_k)) \tag{9} \\\\ e_{k, i} &= \theta^T \tanh(W_x x^{T_i} + W_v v^{t_k}) \tag{10} \\\\ att_k &= \text{gumbel softmax} (e_k) \tag{11} \\\\ v^{t+1}_k &= v^t_k + mask_k \sum_i att_{k, i} x^T_i \tag{12} \,\\ \end{align} \]
  2. Discriminator
    - We take a similar architecture as voxel GNN, but without using the pointer modules. After T = 12 message passing steps, two separate decoders are used. A graph-level max-pooling decoder evaluates the design as a whole while a story-level max-pooling decoder evaluates the per-story layouts individually.
      \[ \begin{align} \,\\ o^{global} &= MLP^{dec}_{global}( \sum_k v^T_k ) \tag{13} \\\\ o^{story} &= Mean_{story} s( MLP^{dec}_{story} ( \sum_{k \in s} v^T_k ) ) \tag{14} \,\\ \end{align} \]
5. Conclusion