Exploring Generative 3D Shapes Using Autoencoder Networks 03/22/2025 Exploring Generative 3D Shapes Using Autoencoder Networks INTRODUCTION 3D shapes have not gained full benefit from machine learning, despite the vast number of 3D shapes now available on the internet. This is mainly because the machine learning algorithms require the consistent representation of input and output data such as an orthogonally aligned grid (i.e., pixels in the images). Unstructured triangle meshes are the most popular surface representation in the computer graphics, but their topological structures are usually different from one another, hindering the use of machine learning. In this paper, we present a new parameterization technique for efficiently converting a given unstructured mesh into one with a manifold mesh with consistent connectivity using depth information. We achieve compact and explicit parameterization of a 3D shape by representing the shape as a height field The main benefits of our parameterization are the generation ofinput and output data that is ready for machine learning (Figure 1-middle). From many shapes in the same category, our autoencoder network constructs a manifold of these shapes. Using the low dimensional representation from the autoencoder, we can generateand explore a variation of the 3D shapes at the interactive rate From the left, Unstructured mesh · Quad mesh with a consistent topology that is compactly parameterized as a height map · Synthesized new shapes Ourcontributions are summarized as follows: A compact and efficient parameterization of 3D shapes. An autoencoder to construct a manifold of 3D shapes. A direct manipulation interface to explore generative shapes. PARAMETERIZATION OF 3D SHAPES In computer graphics, 3D geometries are often available as polygon soup which are usually non-manifold and un-oriented triangle meshes which may contain self-intersections. It is very challenging to construct a consistent representation of such unstructured data. Our parameterization constructs a quad mesh with consistent topology to explicitly represent 3D shapes. The quad mesh is efficiently computed from the depth map from the multi-view projection. Depth Map First, we set up a bounding box that encloses all the training shapes. Then, we divide a face of the bounding box to construct a Cartesian grid . From each center of the grid cell, we shoot a ray in the inward direction \(-\vec{N}\), where \(\vec{N}\) is the normal of the bounding box face. For all the grid cells, we record the depth, i.e., the distance the rays traveled before intersecting any of the triangles in the object. Since the training shapes are always inside bounding box, the depth takes value in the range \((0, D_{max}]\), where \(D_{max}\) is the maximum depth of the bounding box for the grids whose ray does not intersect with the shape. Depth computation For the car shapes, we use the bounding box that has the dimensions \(\text{2m} \times \text{2m} \times \text{6m} \). Each face of the bounding box is divided in the resolution where the grid size \(\Delta_{grid} \) is \(\text{1cm} \). Shrink Wrapping Parameterization We propose to use the shrink wrapping approach [Kobbelt et al . 1999] to consistently parameterize the 3D shapes for machine learning. Shrink wrapping is a technique used to construct a subdivision connectivity mesh by projecting the vertices onto the target mesh while iteratively subdividing a coarse base mesh. In this paper, we use a cube as the base mesh because most of the cars have a box-like geometry in a coarse resolution. We predefine the projection direction \(\vec{d}\) for each subdividing vertex of the cube such that only the projection height \(h\) determines the positions of the vertices. Predefined projection direction \(\vec{d}\) This constant direction projection instead of variable directions helps to reduce the number of parameters to encode the movement of a vertex from three variables (XYZ displacement) to a single variable (scalar height). \[ \,\\ v_{new} = v_{original} + s \cdot \vec{d} \,\\ \] If the vertex is on the edge between the faces with normal -x and +z, the projection direction would be \(\vec{d} = (-1, 0, +1)\). If the vertex is at the corner surrounded by the three faces with normals +x, -y, and -z, the projection direction will be \(\vec{d} = (+1, -1, -1)\). The integer values of the XYZ components of the projection direction help to accelerate the computation of the following ray intersection method. Given the set of depth surfaces \(\mathcal{S}\) for an object, we first move the eight corner vertices of the base cube onto the object’s surface. The corner vertex should be placed such that the base cube approximates the object as much as possible. In each subdivision iteration, we subdivide the cube mesh in half by adding vertices at the center of the edges and quad faces. From each newly added vertex, we shoot a ray in the direction \(\vec{d}\) to find the first intersection point \(\mathbf{p}_s\) against all the depth surfaces \(\mathbf{s} \in \mathcal{S}\) From each newly added vertex, we shoot a ray in the direction \(\vec{d}\) to find the first intersection point \(\mathbf{p}_s\) against all the depth surfaces \(\mathbf{s} \in \mathcal{S}\). If the ray does not intersect with a depth surface, we shoot a ray in an opposite direction \(-\vec{d}\). We denote \(\delta h\) the projection height, i.e., how far the original subdivision point is projected to reach the intersection point \(\mathbf{p}\) in the direction of the ray \(\vec{d}\) Shrink wrapping MACHINE LEARNING Autoencoder Network So far, we have described how a 3D shape is parameterized in a fix-sized high dimensional vector. This high dimensional space is very difficult to explore manually because there are too many parameters to tweak. Here, we use the autoencoder technique to construct nonlinear mapping between a reduced number of parameters to high-dimensional parameters. Autoencoder Network RESULT Since the output of the network is fully determined by the input of these ten neurons, we can synthesize new shapes by changing input values \(\mathbf{q}\) for the decoder between zero and one, which is the range of the sigmoid function. From the top, Examples of input and output of our autoencoder network Synthesized car shapes Interactive Exploration So far, we have described the method used to consistently parameterize shapes and reduce their dimensions using the autoencoder. However, it is sometimes difficult to determine how to manipulate the parameters \(\mathbf{q}\) to obtain a desirable shape since the relationship between parameters and the resulting shape is not obvious. Hence, our interface allows the user to interactively specify \(\mathbf{x}^{'}_{i}\) which is the target position for a vertex \(i\) to steer the synthesis results (see the inset figure). The interface runs the optimization of the input of the decoder parameter \(\mathbf{q}\) such that the position of the output shape's vertex \(\mathbf{x}_{i} (\mathbf{q})\) is as close as possible by minimizing the following error \[ \,\\ E(\mathbf{q}) = ||\mathbf{x}_{i} (\mathbf{q}) - \mathbf{x}^{'}_{i}||_{2} \,\\ \] we could analytically compute the gradient of the error \(E\) concerning the low-dimensional parameter \(\mathbf{q}\) Once the gradient is computed, we update the parameter using the Newton-Raphson iterations Interactive Exploration