Conditional Generative Adversarial Networks (cGANs) are an extension of the original Generative Adversarial Networks.
cGANs extend conventional GANs by feeding additional information into both the generator and the discriminator, which enables the generation of data that is more specific and controlled.
This information acts as a directive or constraint that specifies the type of data the generator is expected to produce.
The paper for
Conditional Generative Adversarial Nets says:
" Generative adversarial nets can be extended to a conditional model if both the generator and discriminator are conditioned on some
extra information y.
y could be any kind of auxiliary information, such as class labels or data from other modalities. We can perform the conditioning by feeding y into the both the discriminator and generator as additional input layer.
"
Conditional adversarial net
Because my research interest lies in handling geometry and generative design, I sought to combine these interests with cGANs in order to solve a simple problem and to better understand the underlying concepts.
To experiment with the approach described above,
a simple geometric problem is defined.
The task is to find the
largest inscribed rectangle within a given 2D polygon, as shown below.
The largest inscribed rectangle
After defining the geometric problem, I implemented the algorithm for finding the
LIR.
In this algorithm, a given polygon is represented as a
binary grid of 1s and 0s.
A value of 1 denotes a solid part, and a value of 0 denotes a void part.
The following representation of a binary grid-shaped polygon uses a 100 x 100 grid size. A larger grid size yields a more precise representation.
Representation of a geometry
From the left, Vector-shaped polygon · Binary grid-shaped polygon
The previous section established the problem and a method for representing geometric data.
This section describes the construction of a dataset used to train a generator that estimates the
LIR for a given input polygon.
First, the function for creating a polygon with random coordinates, named
_get_random_coordinates, is defined as follows:
def _get_random_coordinates(
self, vertices_count_min: int, vertices_count_max: int, scale_factor: float = 1.0
) -> np.ndarray:
"""Generate non-intersected polygon randomly
Args:
vertices_count_min (int): random vertices count minimum value
vertices_count_max (int): random vertices count maximum value
scale_factor (float, optional): constant to scale. Defaults to 1.0.
Returns:
np.ndarray: random coordinates
"""
vertices_count = np.random.randint(vertices_count_min, vertices_count_max)
vertices = np.random.rand(vertices_count, 2)
vertices_centroid = np.mean(vertices, axis=0)
coordinates = sorted(vertices, key=lambda p, c=vertices_centroid: np.arctan2(p[1] - c[1], p[0] - c[0]))
coordinates = np.array(coordinates)
coordinates[:, 0] *= scale_factor
coordinates[:, 1] *= scale_factor
return coordinates
This algorithm sorts the polygon vertices by the angle from the center to each vertex, so that the resulting polygon does not self-intersect.
The complete process of creating a random polygon is illustrated below, and the corresponding code is available at this
link.
The process of creating a random polygon
After creating each random polygon, it must be
resized to match the grid size, which was set to 256x256.
It is then necessary to
convert these
vector-shaped polygons to binary grid-shaped polygons consisting of 1s and 0s, which is straightforward with OpenCV.
Through this process,
5000 datasets were created, as shown in the figure below. Additional datasets can be generated easily if required.
The created random polygons
This section describes the construction of a model based on the DCGAN architecture for 256x256 data and the implementation of a geometric loss function.
In the LIR problem context defined above, the
extra information y corresponds to an input polygon.
The following
forward propagation method takes two inputs: a noise vector and an input polygon.
The
input_polygon is first flattened, and the reshaped tensor is then mapped to feature space by passing through fully connected layers.
The output (128) of the linear transformation is concatenated with the noise (128) tensor.
This concatenation (256) allows the model to use both the random noise and the information from the
input_polygon to generate the output.
class LirGenerator(nn.Module, ModelConfig):
( ... )
def forward(self, noise, input_polygon):
fc = self.linear(input_polygon.reshape(input_polygon.shape[0], -1))
x = torch.cat([noise, fc], dim=1)
x = x.reshape(x.shape[0], 256, 1, 1)
x = self.main(x)
if self.use_tanh:
return nn.Tanh()(x)
return nn.Sigmoid()(x)
Similarly, the
discriminator takes the input polygon as an additional input.
In the forward propagation method of the discriminator, the
rectangle and
input_polygon tensors share the same shape.
They are therefore concatenated and passed to the main layer.
class LirDiscriminator(nn.Module, ModelConfig):
def __init__(self):
super().__init__()
self.main = nn.Sequential(
nn.Conv2d(2, 64, kernel_size=4, stride=2, padding=1, bias=False),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(64, 128, kernel_size=4, stride=2, padding=1, bias=False),
nn.BatchNorm2d(128),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(128, 256, kernel_size=4, stride=2, padding=1, bias=False),
nn.BatchNorm2d(256),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(256, 512, kernel_size=4, stride=2, padding=1, bias=False),
nn.BatchNorm2d(512),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(512, 1, kernel_size=4, stride=2, padding=0, bias=False),
nn.AdaptiveAvgPool2d(1),
nn.Sigmoid(),
)
self.to(self.DEVICE)
def forward(self, rectangle, input_polygon):
x = torch.cat([rectangle, input_polygon], dim=1)
return self.main(x).view(-1, 1).squeeze(1)
Next, the
additional loss functions that compute the geometric features of the generated data are defined. These losses are intended to help train the generator more stably.
The losses consist of:
- BCE loss is a standard loss function used for binary classification tasks. Conventional GANs use only the adversarial loss, but without these additional losses the generator did not produce rectangles during training.
- DIoU loss, or Distance Intersection over Union loss, is a metric used to evaluate the similarity between two boxes.
- Feasibility loss measures how well the generated rectangle fits within the input polygon and the target rectangle, without extending beyond their boundaries or underfitting within them.
- Connectivity loss verifies whether the generated rectangle forms a single connected piece, using the labeling function.
The base models and loss functions are now defined. As an initial check, the model is trained on a single data point to verify that the problem has a structure that can be learned.
The process of training one data for 2000 epochs
From the left, ground truth · training process
Training then proceeds with
training on the 5,000 datasets prepared earlier.
The configuration used for training the models is as follows:
"""Train models with Geometric loss and Tanh using all data"""
"""Test base models using all data"""
lir_dataset = LirDataset()
lir_dataloader = DataLoader(
dataset=lir_dataset,
batch_size=32,
shuffle=True,
drop_last=True,
)
lir_generator = LirGenerator(use_tanh=True)
lir_discriminator = LirDiscriminator()
lir_geometric_loss_function = LirGeometricLoss(
bce_weight=1.0, diou_weight=0.5, feasibility_weight=0.01, connectivity_weight=0.01
)
lir_gan_trainer = LirGanTrainer(
epochs=1000,
lir_generator=lir_generator,
lir_discriminator=lir_discriminator,
lir_dataloader=lir_dataloader,
lir_geometric_loss_function=lir_geometric_loss_function,
initial_weights_key=ModelConfig.XAVIER,
log_interval=1,
use_gradient_penalty=True,
use_lr_scheduler=True,
is_record=True,
record_name="with-geometric-loss-all-data",
)
lir_gan_trainer.set_seed()
lir_gan_trainer.train()
The number of epochs and the batch size were set to 1000 and 32, respectively.
In addition, to qualitatively assess the quality of the generated data, the function was configured to visualize the generated data whenever the given
log_interval is reached, as demonstrated in the following figure.
The following figures show the training process corresponding to the configuration above:
Training process
The left 3 datasets are not included in the dataloader, whereas the right 3 are.
At approximately 200 epochs, the loss no longer decreases. Training is stopped at this point (700 epochs) and the model is loaded for qualitative evaluation. In this case, the ground truth is printed alongside the generated output.
Evaluating the generator
From the top, generated data · ground truth
The left 3 datasets are not included in the dataloader, whereas the right 3 are.
Evaluating the generator
From the top, generated data · ground truth
The left 3 datasets are not included in the dataloader, whereas the right 3 are.
Evaluating the generator
From the top, generated data · ground truth
The left 3 datasets are not included in the dataloader, whereas the right 3 are.
Evaluating the generator
From the top, generated data · ground truth
The left 3 datasets are not included in the dataloader, whereas the right 3 are.
Evaluating the generator
From the top, generated data · ground truth
The left 3 datasets are not included in the dataloader, whereas the right 3 are.
Evaluating the generator
From the top, generated data · ground truth
The left 3 datasets are not included in the dataloader, whereas the right 3 are.
Evaluating the generator
From the top, generated data · ground truth
The left 3 datasets are not included in the dataloader, whereas the right 3 are.
Evaluating the generator
From the top, generated data · ground truth
The left 3 datasets are not included in the dataloader, whereas the right 3 are.
Evaluating the generator
From the top, generated data · ground truth
The left 3 datasets are not included in the dataloader, whereas the right 3 are.
As shown above, the model produces accurate results for the data on which it was trained, which explains why the loss no longer decreased.
Performance on the test data, however, is limited. A notable observation is that when a relatively simple polygon is provided as input, the model produces results similar to the ground truth.
When defining the loss function, I expected that incorporating a geometric loss function would help
generalize the model.
Two versions were therefore compared: one using BCELoss with the geometric loss, and another using BCELoss without it.
The additional loss functions, however, appear to have no measurable effect.
The following directions could be explored further:
- Increasing the volume of training data.
Because the DataCreator class generates data through the LIR algorithm, the dataset can be expanded to any desired size.
It is therefore worth increasing the amount of training data, making the model structure more complex, overfitting, and then examining whether the model generalizes.
- Experimenting with a dataset that has fewer vertices and simpler geometry.
In this project, the training data was generated by creating random geometries with a number of vertices between 3 and 25 and computing the Largest Inscribed Rectangle for each.
This is why the polygon datasets are highly complex, as seen in the Data preparation section.
It would therefore be worthwhile to simplify and generalize the training data before training the model.