K-Rooms clusters

April 10, 2023 ＠ Spacewalk

K-Means clustering 🕃

The K-Means clustering algorithm partitions a given set of data into K clusters. It operates by minimizing the distance differences within each cluster. The algorithm is a form of Unsupervised-Learning, and it serves to label unlabeled input data.

Among clustering methods, the K-Means algorithm belongs to the partitioning method. A partitioning method divides a given set of data into multiple partitions. For example, assume that n data objects are provided as input. The partitioning method then divides the given data into K groups, where K is less than N, and each group forms a cluster. In other words, a body of data is divided into one or more data objects.

Implementation

The operation flow of K-Means clustering consists of the following 5 steps:

Select a value of K (the number of clusters) and enter the data.
Set the initial cluster centroids at random.
Assign each data point to the cluster whose centroid is nearest.
Recalculate the cluster centroids and repeat step 4.
Terminate when the centroid locations are no longer updated.

The K-Means algorithm is implemented based on the steps above. First, the KMeans object is defined. As input, it receives the number of clusters(K) to divide, the point cloud, and iteration_count, which is the number of centroid update iterations.


    class KMeans(PointHelper):
        def __init__(self, points=None, k=3, iteration_count=20, random_seed=0):
            """KMeansCluster simple implementation using Rhino Geometry
    
            Args:
                points (Rhino.Geometry.Point3d, optional): Points to classify. Defaults to None. if points is None, make random points
                k (int, optional): Number to classify. Defaults to 3.
                iteration_count (int, optional): Clusters candidates creation count. Defaults to 20.
                random_seed (int, optional): Random seed number to fix. Defaults to 0.
            """
    
            PointHelper.__init__(self)
    
            self.points = points
            self.k = k
            self.iteration_count = iteration_count
            self.threshold = 0.1
    
            import random  # pylint: disable=import-outside-toplevel
    
            self.random = random
            self.random.seed(random_seed)

Next, the centroids of the clusters are initialized, K in number, by selecting K points from the given point cloud at random. Once the initial centroids are set, the distance between each centroid and the given point cloud is computed, and each data point is assigned to the cluster with the closest distance. All the cluster centroids are then updated. This requires computing the centroids of the initial clusters(point cloud clusters) .

Finally, the distance between the updated centroid and the previous centroid is computed. If this distance no longer changes, the process terminates. Otherwise, the steps described above are repeated.


        def kmeans(self, points, k, threshold):
            """Clusters by each iteration

            Args:
                points (Rhino.Geometry.Point3d): Initialized given points
                k (int): Initialized given k
                threshold (float): Initialized threshold

            Returns:
                Tuple[List[List[Rhino.Geometry.Point3d]], List[List[int]]]: Clusters by each iteration, Indices by each iteration
            """

            centroids = self.random.sample(points, k)

            while True:
                clusters = [[] for _ in centroids]
                indices = [[] for _ in centroids]

                for pi, point in enumerate(points):
                    point_to_centroid_distance = [
                        point.DistanceTo(centroid) for centroid in centroids
                    ]
                    nearest_centroid_index = point_to_centroid_distance.index(
                        min(point_to_centroid_distance)
                    )

                    clusters[nearest_centroid_index].append(point)
                    indices[nearest_centroid_index].append(pi)

                shift_distance = 0.0
                for ci, current_centroid in enumerate(centroids):
                    if len(clusters[ci]) == 0:
                        continue

                    updated_centroid = self.get_centroid(clusters[ci])
                    shift_distance = max(
                        updated_centroid.DistanceTo(current_centroid),
                        shift_distance,
                    )

                    centroids[ci] = updated_centroid

                if shift_distance < threshold:
                    break

            return clusters, indices

With the code above, the point clusters can be obtained by setting K. The detailed code for K-Means is available at this link.

Implemented K-means clustering , divided into 10 From the left, Given Points · Result clusters — Implemented K-means clustering, divided into 10
From the left, Given Points · Result clusters

K-Rooms clusters

As described above, the K-Means algorithm clusters the given points into K groups. Applying it makes it possible to divide a given architectural boundary into K parts. This means that K rooms can be obtained.

Before writing the code, the following order is established.

As input, the algorithm receives the building exterior wall line (a closed line) and the number of rooms to divide into.
An oriented bounding box is created, and a grid is generated from it.
The grid center points are passed to the K-Means clustering algorithm implemented above.
The grids are merged according to the indices of the clustered grid center points.
The shortest path from the core to each room is computed to create a corridor.

The K-Rooms cluster algorithm is now implemented.
The KRoomsClusters class is defined as follows, taking as input the values described in step 1 and inheriting the KMeans class.


    class KRoomsCluster(KMeans, PointHelper, LineHelper, ConstsCollection):
        """
        To use the inherited moduels, refer the link below.
        https://github.com/PARKCHEOLHEE-lab/GhPythonUtils
        """

        def __init__(self, floor, core, hall, target_area, axis=None):
            self.floor = floor
            self.core = core
            self.hall = hall
            self.sorted_hall_segments = self.get_sorted_segment(self.hall)
            self.target_area = target_area
            self.axis = axis

            KMeans.__init__(self)
            PointHelper.__init__(self)
            LineHelper.__init__(self)
            ConstsCollection.__init__(self)

Next, an OBB(oriented bounding box) is created to generate the grid. The OBB defines the grid xy axes. (See this link for the OBB creation algorithm.) The grid and grid centroids are then created, the center points of the grid and the K value are passed in to extract the clustering indices, and all the grids that belong to the same cluster are merged.

K-Rooms clustering key process, divided by 5 From the left, Grid and centroids creation · Divided rooms — K-Rooms clustering key process, divided by 5
From the left, Grid and centroids creation · Divided rooms


        def _gen_grid(self):
            self.base_rectangles = [
                self.get_2d_offset_polygon(seg, self.grid_size)
                for seg in self.sorted_hall_segments[:2]
            ]

            counts = []
            planes = []

            for ri, base_rectangle in enumerate(self.base_rectangles):
                x_vector = (
                    rg.AreaMassProperties.Compute(base_rectangle).Centroid
                    - rg.AreaMassProperties.Compute(self.hall).Centroid
                )

                y_vector = copy.copy(x_vector)
                y_transform = rg.Transform.Rotation(
                    math.pi * 0.5,
                    rg.AreaMassProperties.Compute(base_rectangle).Centroid,
                )
                y_vector.Transform(y_transform)

                base_rectangle.Translate(
                    (
                        self.sorted_hall_segments[0].PointAtStart
                        - self.sorted_hall_segments[0].PointAtEnd
                    )
                    / 2
                )

                base_rectangle.Translate(
                    self.get_normalized_vector(x_vector) * -self.grid_size / 2
                )

                anchor = rg.AreaMassProperties.Compute(base_rectangle).Centroid
                plane = rg.Plane(
                    origin=anchor,
                    xDirection=x_vector,
                    yDirection=y_vector,
                )

                x_proj = self.get_projected_point_on_curve(
                    anchor, plane.XAxis, self.obb
                )

                x_count = (
                    int(math.ceil(x_proj.DistanceTo(anchor) / self.grid_size)) + 1
                )

                y_projs = [
                    self.get_projected_point_on_curve(
                        anchor, plane.YAxis, self.obb
                    ),
                    self.get_projected_point_on_curve(
                        anchor, -plane.YAxis, self.obb
                    ),
                ]

                y_count = [
                    int(math.ceil(y_proj.DistanceTo(anchor) / self.grid_size)) + 1
                    for y_proj in y_projs
                ]

                planes.append(plane)
                counts.append([x_count] + y_count)

            x_grid = []
            for base_rectangle, count, plane in zip(
                self.base_rectangles, counts, planes
            ):
                xc, _, _ = count

                for x in range(xc):
                    copied_rectangle = copy.copy(base_rectangle)
                    vector = plane.XAxis * self.grid_size * x
                    copied_rectangle.Translate(vector)
                    x_grid.append(copied_rectangle)

            y_vectors = [planes[0].YAxis, -planes[0].YAxis]
            y_counts = counts[0][1:]
            all_grid = [] + x_grid
            for rectangle in x_grid:
                for y_count, y_vector in zip(y_counts, y_vectors):
                    for yc in range(1, y_count):
                        copied_rectangle = copy.copy(rectangle)
                        vector = y_vector * self.grid_size * yc
                        copied_rectangle.Translate(vector)
                        all_grid.append(copied_rectangle)

            union_all_grid = rg.Curve.CreateBooleanUnion(all_grid, self.TOLERANCE)

            for y_count, y_vector in zip(y_counts, y_vectors):
                for yc in range(1, y_count):
                    copied_hall = copy.copy(self.hall)
                    copied_hall.Translate(
                        (
                            self.sorted_hall_segments[0].PointAtStart
                            - self.sorted_hall_segments[0].PointAtEnd
                        )
                        / 2
                    )

                    vector = y_vector * self.grid_size * yc
                    copied_hall.Translate(vector)
                    all_grid.extend(
                        rg.Curve.CreateBooleanDifference(
                            copied_hall, union_all_grid
                        )
                    )

            self.grid = []
            for grid in all_grid:
                for boundary in self.boundaries:
                    tidied_grid = list(
                        rg.Curve.CreateBooleanIntersection(
                            boundary.boundary, grid, self.TOLERANCE
                        )
                    )

                    self.grid.extend(tidied_grid)
            self.grid_centroids = [
                rg.AreaMassProperties.Compute(g).Centroid for g in self.grid
            ]

Finally, each room is connected with a hallway, which completes the process.
(The Dijkstra algorithm was implemented for the shortest-path computation. The implementation is available at the following link, and a dedicated post on this subject will follow.)

Limitation of K-Rooms clusters

The K-Rooms Clusters algorithm implemented here is still incomplete. The images shown above are valid results when the K value is small. As the K value increases and the number of rooms grows, a problem arises in which an architecturally appropriate shape cannot be derived, as shown in the example below:

Improper shapes, divided into 13 From the left, K-Rooms · Result corridor — Improper shapes, divided into 13
From the left, K-Rooms · Result corridor

To address this problem, what remains is to define the appropriate shape and to create the logic that mitigates the issue through post-processing. The complete code of K-Rooms clusters is available at this link.