Shape-conditional GANs

Conditional Generative Adversarial Networks (cGANs) are an extension of the original Generative Adversarial Networks. cGANs extend conventional GANs by feeding additional information into both the generator and the discriminator, which enables the generation of data that is more specific and controlled. This information acts as a directive or constraint that specifies the type of data the generator is expected to produce.

The paper for Conditional Generative Adversarial Nets says:
" Generative adversarial nets can be extended to a conditional model if both the generator and discriminator are conditioned on some extra information y. y could be any kind of auxiliary information, such as class labels or data from other modalities. We can perform the conditioning by feeding y into the both the discriminator and generator as additional input layer. "

Because my research interest lies in handling geometry and generative design, I sought to combine these interests with cGANs in order to solve a simple problem and to better understand the underlying concepts.

To experiment with the approach described above, a simple geometric problem is defined. The task is to find the largest inscribed rectangle within a given 2D polygon, as shown below.

The largest inscribed rectangle

After defining the geometric problem, I implemented the algorithm for finding the LIR. In this algorithm, a given polygon is represented as a binary grid of 1s and 0s. A value of 1 denotes a solid part, and a value of 0 denotes a void part. The following representation of a binary grid-shaped polygon uses a 100 x 100 grid size. A larger grid size yields a more precise representation.

Representation of a geometry From the left, Vector-shaped polygon · Binary grid-shaped polygon — Representation of a geometry
From the left, Vector-shaped polygon · Binary grid-shaped polygon

The previous section established the problem and a method for representing geometric data. This section describes the construction of a dataset used to train a generator that estimates the LIR for a given input polygon.

First, the function for creating a polygon with random coordinates, named _get_random_coordinates, is defined as follows:


    def _get_random_coordinates(
        self, vertices_count_min: int, vertices_count_max: int, scale_factor: float = 1.0
    ) -> np.ndarray:
        """Generate non-intersected polygon randomly

        Args:
            vertices_count_min (int): random vertices count minimum value
            vertices_count_max (int): random vertices count maximum value
            scale_factor (float, optional): constant to scale. Defaults to 1.0.

        Returns:
            np.ndarray: random coordinates
        """

        vertices_count = np.random.randint(vertices_count_min, vertices_count_max)
        vertices = np.random.rand(vertices_count, 2)
        vertices_centroid = np.mean(vertices, axis=0)

        coordinates = sorted(vertices, key=lambda p, c=vertices_centroid: np.arctan2(p[1] - c[1], p[0] - c[0]))

        coordinates = np.array(coordinates)
        coordinates[:, 0] *= scale_factor
        coordinates[:, 1] *= scale_factor

        return coordinates

This algorithm sorts the polygon vertices by the angle from the center to each vertex, so that the resulting polygon does not self-intersect. The complete process of creating a random polygon is illustrated below, and the corresponding code is available at this link.

The process of creating a random polygon

After creating each random polygon, it must be resized to match the grid size, which was set to 256x256. It is then necessary to convert these vector-shaped polygons to binary grid-shaped polygons consisting of 1s and 0s, which is straightforward with OpenCV. Through this process, 5000 datasets were created, as shown in the figure below. Additional datasets can be generated easily if required.

The created random polygons (1 of 10) — The created random polygons

The created random polygons (2 of 10) — The created random polygons

This section describes the construction of a model based on the DCGAN architecture for 256x256 data and the implementation of a geometric loss function.

In the LIR problem context defined above, the extra information y corresponds to an input polygon. The following forward propagation method takes two inputs: a noise vector and an input polygon. The input_polygon is first flattened, and the reshaped tensor is then mapped to feature space by passing through fully connected layers.

The output (128) of the linear transformation is concatenated with the noise (128) tensor. This concatenation (256) allows the model to use both the random noise and the information from the input_polygon to generate the output.


    class LirGenerator(nn.Module, ModelConfig):

        ( ... )

        def forward(self, noise, input_polygon):
            fc = self.linear(input_polygon.reshape(input_polygon.shape[0], -1))
            x = torch.cat([noise, fc], dim=1)
            x = x.reshape(x.shape[0], 256, 1, 1)
            x = self.main(x)

            if self.use_tanh:
                return nn.Tanh()(x)

            return nn.Sigmoid()(x)

Similarly, the discriminator takes the input polygon as an additional input. In the forward propagation method of the discriminator, the rectangle and input_polygon tensors share the same shape. They are therefore concatenated and passed to the main layer.


        class LirDiscriminator(nn.Module, ModelConfig):
            def __init__(self):
                super().__init__()
        
                self.main = nn.Sequential(
                    nn.Conv2d(2, 64, kernel_size=4, stride=2, padding=1, bias=False),
                    nn.LeakyReLU(0.2, inplace=True),
                    nn.Conv2d(64, 128, kernel_size=4, stride=2, padding=1, bias=False),
                    nn.BatchNorm2d(128),
                    nn.LeakyReLU(0.2, inplace=True),
                    nn.Conv2d(128, 256, kernel_size=4, stride=2, padding=1, bias=False),
                    nn.BatchNorm2d(256),
                    nn.LeakyReLU(0.2, inplace=True),
                    nn.Conv2d(256, 512, kernel_size=4, stride=2, padding=1, bias=False),
                    nn.BatchNorm2d(512),
                    nn.LeakyReLU(0.2, inplace=True),
                    nn.Conv2d(512, 1, kernel_size=4, stride=2, padding=0, bias=False),
                    nn.AdaptiveAvgPool2d(1),
                    nn.Sigmoid(),
                )
        
                self.to(self.DEVICE)

            def forward(self, rectangle, input_polygon):
                x = torch.cat([rectangle, input_polygon], dim=1)
                return self.main(x).view(-1, 1).squeeze(1)

Next, the additional loss functions that compute the geometric features of the generated data are defined. These losses are intended to help train the generator more stably. The losses consist of:

BCE loss is a standard loss function used for binary classification tasks. Conventional GANs use only the adversarial loss, but without these additional losses the generator did not produce rectangles during training.
DIoU loss, or Distance Intersection over Union loss, is a metric used to evaluate the similarity between two boxes.
Feasibility loss measures how well the generated rectangle fits within the input polygon and the target rectangle, without extending beyond their boundaries or underfitting within them.
Connectivity loss verifies whether the generated rectangle forms a single connected piece, using the labeling function.

The base models and loss functions are now defined. As an initial check, the model is trained on a single data point to verify that the problem has a structure that can be learned.

The process of training one data for 2000 epochs From the left, ground truth · training process — The process of training one data for 2000 epochs
From the left, ground truth · training process

Training then proceeds with training on the 5,000 datasets prepared earlier. The configuration used for training the models is as follows:


    """Train models with Geometric loss and Tanh using all data"""
    """Test base models using all data"""

    lir_dataset = LirDataset()
    lir_dataloader = DataLoader(
        dataset=lir_dataset,
        batch_size=32,
        shuffle=True,
        drop_last=True,
    )

    lir_generator = LirGenerator(use_tanh=True)
    lir_discriminator = LirDiscriminator()

    lir_geometric_loss_function = LirGeometricLoss(
        bce_weight=1.0, diou_weight=0.5, feasibility_weight=0.01, connectivity_weight=0.01
    )

    lir_gan_trainer = LirGanTrainer(
        epochs=1000,
        lir_generator=lir_generator,
        lir_discriminator=lir_discriminator,
        lir_dataloader=lir_dataloader,
        lir_geometric_loss_function=lir_geometric_loss_function,
        initial_weights_key=ModelConfig.XAVIER,
        log_interval=1,
        use_gradient_penalty=True,
        use_lr_scheduler=True,
        is_record=True,
        record_name="with-geometric-loss-all-data",
    )

    lir_gan_trainer.set_seed()
    lir_gan_trainer.train()

The number of epochs and the batch size were set to 1000 and 32, respectively. In addition, to qualitatively assess the quality of the generated data, the function was configured to visualize the generated data whenever the given log_interval is reached, as demonstrated in the following figure. The following figures show the training process corresponding to the configuration above:

Training process The left 3 datasets are not included in the dataloader, whereas the right 3 are. (1 of 2) — Training process
The left 3 datasets are not included in the dataloader, whereas the right 3 are.

Training process The left 3 datasets are not included in the dataloader, whereas the right 3 are. (2 of 2) — Training process
The left 3 datasets are not included in the dataloader, whereas the right 3 are.

At approximately 200 epochs, the loss no longer decreases. Training is stopped at this point (700 epochs) and the model is loaded for qualitative evaluation. In this case, the ground truth is printed alongside the generated output.

Evaluating the generator From the top, generated data · ground truth The left 3 datasets are not included in the dataloader,… (1 of 2) — Evaluating the generator
From the top, generated data · ground truth
The left 3 datasets are not included in the dataloader, whereas the right 3 are.

Evaluating the generator From the top, generated data · ground truth The left 3 datasets are not included in the dataloader,… (2 of 2) — Evaluating the generator
From the top, generated data · ground truth
The left 3 datasets are not included in the dataloader, whereas the right 3 are.

As shown above, the model produces accurate results for the data on which it was trained, which explains why the loss no longer decreased. Performance on the test data, however, is limited. A notable observation is that when a relatively simple polygon is provided as input, the model produces results similar to the ground truth.

When defining the loss function, I expected that incorporating a geometric loss function would help generalize the model. Two versions were therefore compared: one using BCELoss with the geometric loss, and another using BCELoss without it. The additional loss functions, however, appear to have no measurable effect.

The following directions could be explored further:

Increasing the volume of training data.

DataCreator

Experimenting with a dataset that has fewer vertices and simpler geometry.

3 and 25

Data preparation

What is Conditional Generative Adversarial Networks ❓

Problem definition

Data preparation

Building models and loss functions

Training and evaluating

Conclusions & future works

References