This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Getting Started

Learn how to install and start to use Kapacity

1 - Installation

How to install or uninstall Kapacity

Prerequisites

Install

Install cert-manager

Install cert-manager by Helm.

helm repo add jetstack https://charts.jetstack.io
helm repo update

helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set installCRDs=true

Install Prometheus

Install Prometheus with kube-state-metrics by Helm.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install \
  prometheus prometheus-community/prometheus \
  --namespace prometheus \
  --create-namespace \
  --set alertmanager.enabled=false \
  --set prometheus-node-exporter.enabled=false \
  --set prometheus-pushgateway.enabled=false

Run following command to view Prometheus Server’s ClusterIP and port:

kubectl get svc -n prometheus prometheus-server
NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
prometheus-server   ClusterIP   10.104.214.48   <none>        80/TCP    5m

Install Kapacity

Install Kapacity by Helm. The Prometheus Server address params are those got in previous step.

helm repo add kapacity https://traas-stack.github.io/kapacity-charts
helm repo update

helm install \
  kapacity-manager kapacity/kapacity-manager \
  --namespace kapacity-system \
  --create-namespace \
  --set prometheus.address=http://<prometheus-server-clusterip>:<prometheus-server-port> 

Verify Kapacity installation

kubectl get deploy -n kapacity-system
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
kapacity-manager   1/1     1            1           5m

Uninstall

helm uninstall kapacity-manager -n kapacity-system
helm uninstall prometheus -n prometheus
helm uninstall cert-manager -n cert-manager

2 - Quick Start

Getting started to use Kapacity

2.1 - IHPA (Intelligent HPA)

Getting started to use IHPA

IHPA (Intelligent HPA) is an intelligent, risk-defensive, highly adaptive and customizable substitution for HPA.

You can follow below guides to quickly try some core features of IHPA.

2.1.1 - Cron Scaling

Before you begin

You need to have a Kubernetes cluster with Kapacity installed.

Run sample workload

Download nginx-statefulset.yaml and run following command to run an NGINX workload:

kubectl apply -f nginx-statefulset.yaml

Check if the workload is running:

kubectl get po
NAME      READY   STATUS    RESTARTS   AGE
nginx-0   1/1     Running   0          5s

Create IHPA with cron portrait provider

Download cron-portrait-sample.yaml which looks like this:

apiVersion: autoscaling.kapacitystack.io/v1alpha1
kind: IntelligentHorizontalPodAutoscaler
metadata:
  name: cron-portrait-sample
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: nginx
  minReplicas: 1
  maxReplicas: 10
  portraitProviders:
  - type: Cron
    priority: 1
    cron:
      crons:
      - name: cron-1
        start: 0 * * * *
        end: 10 * * * *
        replicas: 1
      - name: cron-2
        start: 10 * * * *
        end: 20 * * * *
        replicas: 2
      - name: cron-3
        start: 20 * * * *
        end: 30 * * * *
        replicas: 3
      - name: cron-4
        start: 30 * * * *
        end: 40 * * * *
        replicas: 4
      - name: cron-5
        start: 40 * * * *
        end: 50 * * * *
        replicas: 5

Run following command to create the IHPA:

kubectl apply -f cron-portrait-sample.yaml

Verify results

You can see that the replica number of the workload is changing dynamically accroding to our configration by checking the events of the IHPA:

kubectl describe ihpa cron-portrait-sample
...
Events:
  Type     Reason                Age                From             Message
  ----     ------                ----               ----             -------
  Normal   CreateReplicaProfile  38m                ihpa_controller  create ReplicaProfile with onlineReplcas: 3, cutoffReplicas: 0, standbyReplicas: 0
  Normal   UpdateReplicaProfile  33m (x2 over 33m)  ihpa_controller  update ReplicaProfile with onlineReplcas: 3 -> 4, cutoffReplicas: 0 -> 0, standbyReplicas: 0 -> 0
  Normal   UpdateReplicaProfile  23m                ihpa_controller  update ReplicaProfile with onlineReplcas: 4 -> 5, cutoffReplicas: 0 -> 0, standbyReplicas: 0 -> 0
  Warning  NoValidPortraitValue  13m                ihpa_controller  no valid portrait value for now
  Normal   UpdateReplicaProfile  3m15s              ihpa_controller  update ReplicaProfile with onlineReplcas: 5 -> 1, cutoffReplicas: 0 -> 0, standbyReplicas: 0 -> 0

You can also verify it by directly watching the replica number of the workload.

Cleanup

Run following command to cleanup all the resources:

kubectl delete -f cron-portrait-sample.yaml 
kubectl delete -f nginx-statefulset.yaml 

2.1.2 - Reactive Scaling

Before you begin

You need to have a Kubernetes cluster with Kapacity and Prometheus installed.

Run sample workload

Download nginx-statefulset.yaml and run following command to run an NGINX workload:

kubectl apply -f nginx-statefulset.yaml

Check if the workload is running:

kubectl get po
NAME      READY   STATUS    RESTARTS   AGE
nginx-0   1/1     Running   0          5s

Create IHPA with dynamic reactive portrait provider

Download dynamic-reactive-portrait-sample.yaml which looks like this:

apiVersion: autoscaling.kapacitystack.io/v1alpha1
kind: IntelligentHorizontalPodAutoscaler
metadata:
  name: dynamic-reactive-portrait-sample
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: nginx
  minReplicas: 1
  maxReplicas: 10
  portraitProviders:
  - type: Dynamic
    priority: 1
    dynamic:
      portraitType: Reactive
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 30
      algorithm:
        type: KubeHPA

Run following command to create the IHPA:

kubectl apply -f dynamic-reactive-portrait-sample.yaml

Increase the load

Run following command to get the ClusterIP and port of the NGINX service:

kubectl get svc nginx
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
nginx        ClusterIP   10.111.21.74   <none>        80/TCP     13m

Start a different pod to act as a client which will send requests to the NGINX service infinitely with the service ip and port replaced by the value got in previous step:

# Run this in a separate terminal so that the load generation continues and you can carry on with the rest of the steps
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://<service-ip>:<service-port> > /dev/null; done"

After several minutes, you can see that the workload is scaled up by checking events of the IHPA:

kubectl describe ihpa dynamic-reactive-portrait-sample
...
Events:
  Type    Reason                Age    From             Message
  ----    ------                ----   ----             -------
  Normal  CreateReplicaProfile  6m58s  ihpa_controller  create ReplicaProfile with onlineReplcas: 1, cutoffReplicas: 0, standbyReplicas: 0
  Normal  UpdateReplicaProfile  3m45s  ihpa_controller  update ReplicaProfile with onlineReplcas: 1 -> 6, cutoffReplicas: 0 -> 0, standbyReplicas: 0 -> 0

Stop generating load

In the terminal where you created the Pod that runs a busybox image, terminate the load generation by typing <Ctrl> + C.

After several minutes, you can see that the workload is scaled down by checking events of the IHPA:

kubectl describe ihpa dynamic-reactive-portrait-sample
...
Events:
  Type    Reason                Age    From             Message
  ----    ------                ----   ----             -------
  Normal  CreateReplicaProfile  9m58s  ihpa_controller  create ReplicaProfile with onlineReplcas: 1, cutoffReplicas: 0, standbyReplicas: 0
  Normal  UpdateReplicaProfile  6m45s  ihpa_controller  update ReplicaProfile with onlineReplcas: 1 -> 6, cutoffReplicas: 0 -> 0, standbyReplicas: 0 -> 0
  Normal  UpdateReplicaProfile  3m15s  ihpa_controller  update ReplicaProfile with onlineReplcas: 6 -> 4, cutoffReplicas: 0 -> 0, standbyReplicas: 0 -> 0
  Normal  UpdateReplicaProfile  2m45s  ihpa_controller  update ReplicaProfile with onlineReplcas: 4 -> 1, cutoffReplicas: 0 -> 0, standbyReplicas: 0 -> 0

Cleanup

Run following command to cleanup all the resources:

kubectl delete -f dynamic-reactive-portrait-sample.yaml 
kubectl delete -f nginx-statefulset.yaml 

2.1.3 - Predictive Scaling

Note: Most contents of this page are translated by AI from Chinese. It would be very appreciated if you could help us improve by editing this page (click the corresponding link on the right side).

Before you begin

You need to have a Kubernetes cluster with Kapacity and Prometheus installed.

Make sure your Kubernetes cluster has a working DNS (like CoreDNS) to resolve Service domain names. If not, you need to adjust the configuration of Kapacity as follows:

Use the following command to view the ClusterIP and port of the Kapacity gRPC Server:

kubectl get svc -n kapacity-system kapacity-grpc-service
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
kapacity-grpc-service   ClusterIP   192.168.38.172   <none>        9090/TCP   5m

Use the following command to update the configuration of Kapacity, where the parameters related to the Kapacity gRPC Server address are the values viewed in the previous step:

helm upgrade \
  kapacity-manager kapacity/kapacity-manager \
  --namespace kapacity-system \
  --reuse-values \
  --set algorithmJob.defaultMetricsServerAddr=<kapacity-grpc-server-clusterip>:<kapacity-grpc-server-port> 

Install and configure Ingress NGINX Controller

Kapacity IHPA’s predictive scaling uses the “Traffic-Driven Replicas Prediction” algorithm, so we need at least one traffic metric to use predictive scaling. Here we use Ingress NGINX as an example of workload ingress traffic.

If your Kubernetes cluster does not yet have Ingress NGINX Controller, please refer to the official documentation for installation.

After the installation is complete, follow this document to configure to ensure that Prometheus can collect the metrics of Ingress NGINX.

Configure Kapacity to recognize Ingress NGINX metrics

Use the following command to add the metrics of Ingress NGINX in the custom Prometheus metrics configuration of Kapacity:

kubectl edit cm -n kapacity-system kapacity-config
apiVersion: v1
data:
  prometheus-metrics-config.yaml: |
    resourceRules:
      ...
    # add Ingress NGINX metrics in rules
    rules:
    - seriesQuery: '{__name__="nginx_ingress_controller_requests"}'
      metricsQuery: round(sum(irate(<<.Series>>{<<.LabelMatchers>>}[3m])) by (<<.GroupBy>>), 0.001)
      name:
        as: nginx_ingress_controller_requests_rate
      resources:
        template: <<.Resource>>
        # note: uncomment the overrides field below if your Prometheus is installed with Prometheus Operator
        # overrides:
        #   exported_namespace:
        #     resource: namespace
    externalRules:
      ...    
kind: ConfigMap
...

As you can see, this configuration is fully compatible with the Prometheus Adapter configuration, more background information can be found in this user guide.

Then, use the following command to restart Kapacity Manager to load the latest configuration:

kubectl rollout restart -n kapacity-system deploy/kapacity-manager

Run sample workload

  1. Download nginx-statefulset.yaml and run following command to run an NGINX workload:
kubectl apply -f nginx-statefulset.yaml

Check if the workload is running:

kubectl get po
NAME      READY   STATUS    RESTARTS   AGE
nginx-0   1/1     Running   0          5s
  1. Download nginx-ingress.yaml and run following command to create an Ingress for the NGINX workload:
kubectl apply -f nginx-ingress.yaml

Verify that Ingress was created successfully and record the ADDRESS of Ingress:

kubectl get ing
NAME           CLASS   HOSTS               ADDRESS           PORTS   AGE
nginx-server   nginx   nginx.example.com   139.224.120.211   80      2d
  1. Download periodic-client.yaml,replace <nginx-ingress-address> with the ADDRESS of the Ingress recorded in the previous step, and then execute the following command to create a client Pod that sends requests to the NGINX service on a periodic basis (with 1 hour as 1 cycle):
kubectl apply -f periodic-client.yaml

It will generate periodic traffic as shown in the following figure:

Since the algorithm needs a certain amount of data for learning, it is recommended to run for at least 24 hours before proceeding to the next step.

Train time series forecasting model

Please refer to the user guide to use this configuration to complete the training of the time series prediction model, and then execute the following command to save the model and its accessory files as a ConfigMap for subsequent algorithm tasks to use, replace <model-save-path> with the actual model saving directory path:

kubectl create cm -n kapacity-system example-model --from-file=<model-save-path>

Create IHPA with dynamic predictive portrait provider

Download dynamic-predictive-portrait-sample.yaml which looks like this:

apiVersion: autoscaling.kapacitystack.io/v1alpha1
kind: IntelligentHorizontalPodAutoscaler
metadata:
  name: predictive-sample
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: nginx
  minReplicas: 1
  maxReplicas: 10
  portraitProviders:
  - type: Dynamic
    priority: 1
    dynamic:
      portraitType: Predictive
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: AverageValue
            averageValue: 1m
      - type: Pods
        pods:
          metric:
            name: kube_pod_status_ready
          target:
            type: NA
      - name: qps
        type: Object
        object:
          describedObject:
            apiVersion: networking.k8s.io/v1
            kind: Ingress
            name: nginx-server
          metric:
            name: nginx_ingress_controller_requests_rate
          target:
            type: NA
      algorithm:
        type: ExternalJob
        externalJob:
          job:
            type: CronJob
            cronJob:
              template:
                spec:
                  schedule: "0/30 * * * *"
                  jobTemplate:
                    spec:
                      template:
                        spec:
                          containers:
                          - name: algorithm
                            args:
                            - --tsf-model-path=/opt/kapacity/timeseries/forecasting/model
                            - --re-history-len=24H
                            - --re-time-delta-hours=8
                            - --re-test-dataset-size-in-seconds=3600
                            - --scaling-freq=10min
                            volumeMounts:
                            - name: model
                              mountPath: /opt/kapacity/timeseries/forecasting/model
                              readOnly: true
                          volumes:
                          - name: model
                            configMap:
                              name: example-model
                          restartPolicy: OnFailure
          resultSource:
            type: ConfigMap

Please replace the value of the algorithm parameter --re-time-delta-hours with the UTC offset value of your time zone, such as 8 for UTC+8 time zone, -7 for UTC-7 time zone.

Let’s look at the metrics first. In the “Traffic-driven replica prediction” algorithm, we need multiple types of metrics to jointly drive the algorithm, so we have agreed on the following metric configuration specification:

  • The first metric should be configured as the target resource metric of this workload, so the type can only be Resource or ContainerResource. It specifies the target resource level we expect IHPA to help us maintain.
  • The second metric should be configured as the online replica count metric of this workload. The algorithm will use this metric to query the historical Ready Pod (i.e., the Pod carrying traffic) count of this workload. The type of this metric can only be Pods. It will be aggregated on the workload dimension by regular matching of the Pod name for the query. Kapacity defaults to the kube_pod_status_ready metric based on kube-state-metrics for direct use. Note that since this metric is only used for historical query, we do not need to specify a target value for it. Therefore, we write a placeholder NA for its target type.
  • The third and subsequent metrics should be configured as traffic metrics (such as QPS, RPC, etc.) that are positively correlated with the target resource metric of this workload. The algorithm will perform time series prediction on these metrics, and then based on the historical resource level and replica count, it will give a predicted replica count that can meet the target resource level in the future. These metrics can be of any type other than Resource and ContainerResource, but note that you must set the same name for these metrics as set during training. Similarly, these metrics are also only used for historical queries, so you do not need to set target values.

Then let’s look at the algorithm parameters. Here is a brief explanation of the functions of a few key parameters. More information can be referred to the flags description of the algorithm script itself:

  • --re-history-len: This parameter specifies the historical length of the replica recommendation algorithm learning, it is generally recommended to cover at least two behavioral cycles of the application.
  • --re-time-delta-hours: This parameter specifies the UTC offset value of the time zone where the application is located, the replica number recommendation algorithm needs to sense the time zone information to learn the time features.
  • --re-test-dataset-size-in-seconds: This parameter specifies the test set size of the replica recommendation algorithm learning, default is one day (86400), only when the historical length is less than one day do you need to shorten it, such as setting it as one hour (3600) in this example.
  • --scaling-freq: This parameter specifies the accuracy of the final replica number prediction result of the algorithm, that is, the maximum frequency of actual scaling, so it cannot be shorter than the original prediction accuracy of the timeseries forecasting algorithm (the freq parameter used in timeseries forecasting model training). The algorithm will resample the original prediction result by the maximum value according to the given accuracy and output it. For example, if this parameter is set to 1 hour, the algorithm will finally give the maximum number of replicas needed by this workload every hour, and finally this workload will scale up and down at most once an hour.

Execute the following command to create this IHPA:

kubectl apply -f dynamic-predictive-portrait-sample.yaml

Verify results

  1. Verify that IHPA automatically created a CronJob to run the algorithm task, and the last task ran successfully:
kubectl get cj -n kapacity-system
NAME                                   SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
default-predictive-sample-predictive   0/30 * * * *   False     1        26m             2d1h
kubectl get job -n kapacity-system
NAME                                            COMPLETIONS   DURATION   AGE
default-predictive-sample-predictive-28286564   1/1           16s        28m
  1. Verify that the algorithm result was successfully written to the predictive horizontal portrait of IHPA:
kubectl get hp predictive-sample-predictive -o yaml
apiVersion: autoscaling.kapacitystack.io/v1alpha1
kind: HorizontalPortrait
metadata:
  name: predictive-sample-predictive
  namespace: default
  ...
spec:
  ...
status:
  conditions:
  - lastTransitionTime: "2023-10-25T11:00:00Z"
    message: portrait has been successfully generated
    observedGeneration: 1
    reason: SucceededGeneratePortrait
    status: "True"
    type: PortraitGenerated
  portraitData:
    expireTime: "2023-10-25T11:30:00Z"
    timeSeries:
      timeSeries:
      - replicas: 4
        timestamp: 1698231600
      - replicas: 3
        timestamp: 1698232200
      - replicas: 2
        timestamp: 1698232800
    type: TimeSeries
  1. Verify that IHPA has adjusted the replica count of the workload according to the prediction result of the algorithm:
kubectl describe ihpa predictive-sample
...
Events:
  Type     Reason                Age                 From             Message
  ----     ------                ----                ----             -------
  Warning  NoValidPortraitValue  29m (x10 over 85m)  ihpa_controller  no valid portrait value for now
  Normal   UpdateReplicaProfile  25m                 ihpa_controller  update ReplicaProfile with onlineReplcas: 1 -> 4, cutoffReplicas: 0 -> 0, standbyReplicas: 0 -> 0
  Normal   UpdateReplicaProfile  15m                 ihpa_controller  update ReplicaProfile with onlineReplcas: 4 -> 3, cutoffReplicas: 0 -> 0, standbyReplicas: 0 -> 0
  Normal   UpdateReplicaProfile  5m9s                ihpa_controller  update ReplicaProfile with onlineReplcas: 3 -> 2, cutoffReplicas: 0 -> 0, standbyReplicas: 0 -> 0

Cleanup

Run following command to cleanup all the resources:

kubectl delete -f dynamic-predictive-portrait-sample.yaml
kubectl delete -f periodic-client.yaml
kubectl delete -f nginx-ingress.yaml
kubectl delete -f nginx-statefulset.yaml

2.1.4 - Multi-Stage Gray Scaling

Before you begin

You need to have a Kubernetes cluster with Kapacity installed.

Run sample workload

Download nginx-statefulset.yaml and run following command to run an NGINX workload:

kubectl apply -f nginx-statefulset.yaml

Check if the workload is running:

kubectl get po
NAME      READY   STATUS    RESTARTS   AGE
nginx-0   1/1     Running   0          5s

Create IHPA with gray scale down strategy

Download gray-strategy-sample.yaml which looks like this:

apiVersion: autoscaling.kapacitystack.io/v1alpha1
kind: IntelligentHorizontalPodAutoscaler
metadata:
  name: gray-strategy-sample
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: nginx
  minReplicas: 1
  maxReplicas: 10
  portraitProviders:
  - priority: 1
    static:
      replicas: 1
    type: Static
  - cron:
      crons:
      - name: cron-1
        replicas: 5
        start: 0 * * * *
        end: 10 * * * *
    priority: 2
    type: Cron
  behavior:
    scaleDown:
      grayStrategy:
        grayState: Cutoff         # GrayState is the desired state of pods that in gray stage.
        changeIntervalSeconds: 30 # ChangeIntervalSeconds is the interval time between each gray change.
        changePercent: 50         # ChangePercent is the percentage of the total change of replica numbers which is used to calculate the amount of pods to change in each gray change.
        observationSeconds: 60    # ObservationSeconds is the additional observation time after the gray change reaching 100%.

This IHPA contains below two portrait providers:

  • A static portrait provider with priority 1 and a static replica number 1.
  • A cron portrait provider with priority 2 and replica number 5 which takes effect the 0th minute to the 10th minute of every hour.

Note that the cron portrait provider’s priority is higher than the static one, so the replica number it provided would override the one provided by the static portrait provider when it is effective.

Run following command to create the IHPA:

kubectl apply -f gray-strategy-sample.yaml

Verify results

We can see that the cron portrait provider is taking effect at 0~10th minute of every hour, and the replicas of the workload are scaled up from 1 to 5:

kubectl get po -L 'kapacitystack.io/pod-state' -o wide
NAME      READY   STATUS    RESTARTS   AGE   IP          NODE             NOMINATED NODE   READINESS GATES   POD-STATE
nginx-0   1/1     Running   0          50m   10.1.5.52   docker-desktop   <none>           1/1
nginx-1   1/1     Running   0          56s   10.1.5.68   docker-desktop   <none>           1/1
nginx-2   1/1     Running   0          54s   10.1.5.69   docker-desktop   <none>           1/1
nginx-3   1/1     Running   0          52s   10.1.5.70   docker-desktop   <none>           1/1
nginx-4   1/1     Running   0          50s   10.1.5.71   docker-desktop   <none>           1/1

The number of endpoints of the service of the workload is also changed to 5:

kubectl get ep nginx
NAME    ENDPOINTS                                            AGE
nginx   10.1.5.52:80,10.1.5.68:80,10.1.5.69:80 + 2 more...   3d3h

At the 10th minute, we can see that 2 pods change to Cutoff state and are removed from the endpoints of the service:

kubectl get po -L 'kapacitystack.io/pod-state' -o wide
NAME      READY   STATUS    RESTARTS   AGE   IP          NODE             NOMINATED NODE   READINESS GATES   POD-STATE
nginx-0   1/1     Running   0          51m   10.1.5.52   docker-desktop   <none>           1/1
nginx-1   1/1     Running   0          63s   10.1.5.68   docker-desktop   <none>           1/1
nginx-2   1/1     Running   0          61s   10.1.5.69   docker-desktop   <none>           1/1
nginx-3   1/1     Running   0          59s   10.1.5.70   docker-desktop   <none>           0/1               Cutoff
nginx-4   1/1     Running   0          57s   10.1.5.71   docker-desktop   <none>           0/1               Cutoff
kubectl get ep nginx
NAME    ENDPOINTS                                AGE
nginx   10.1.5.52:80,10.1.5.68:80,10.1.5.69:80   3d3h

After waiting for another 30 seconds, we can see that the 4 pods change to Cutoff state and are removed from the endpoints of the service:

kubectl get po -L 'kapacitystack.io/pod-state' -o wide
NAME      READY   STATUS    RESTARTS   AGE   IP          NODE             NOMINATED NODE   READINESS GATES   POD-STATE
nginx-0   1/1     Running   0          51m   10.1.5.52   docker-desktop   <none>           1/1
nginx-1   1/1     Running   0          96s   10.1.5.68   docker-desktop   <none>           0/1               Cutoff
nginx-2   1/1     Running   0          94s   10.1.5.69   docker-desktop   <none>           0/1               Cutoff
nginx-3   1/1     Running   0          92s   10.1.5.70   docker-desktop   <none>           0/1               Cutoff
nginx-4   1/1     Running   0          90s   10.1.5.71   docker-desktop   <none>           0/1               Cutoff
kubectl get ep nginx
NAME    ENDPOINTS      AGE
nginx   10.1.5.52:80   3d3h

After waiting for another 1 minute, we can see that the replicas of the workload are finally scaled down to 1:

kubectl get po -L 'kapacitystack.io/pod-state' -o wide
NAME      READY   STATUS    RESTARTS   AGE    IP          NODE             NOMINATED NODE   READINESS GATES   POD-STATE
nginx-0   1/1     Running   0          52m    10.1.5.52   docker-desktop   <none>           1/1

You can also see the entire process of scaling by checking events of the IHPA:

kubectl describe ihpa gray-strategy-sample
...
Events:
  Type    Reason                Age    From             Message
  ----    ------                ----   ----             -------
  Normal  CreateReplicaProfile  3m53s  ihpa_controller  create ReplicaProfile with onlineReplcas: 1, cutoffReplicas: 0, standbyReplicas: 0
  Normal  UpdateReplicaProfile  2m44s  ihpa_controller  update ReplicaProfile with onlineReplcas: 1 -> 5, cutoffReplicas: 0 -> 0, standbyReplicas: 0 -> 0
  Normal  UpdateReplicaProfile  104s   ihpa_controller  update ReplicaProfile with onlineReplcas: 5 -> 3, cutoffReplicas: 0 -> 2, standbyReplicas: 0 -> 0
  Normal  UpdateReplicaProfile  74s    ihpa_controller  update ReplicaProfile with onlineReplcas: 3 -> 1, cutoffReplicas: 2 -> 4, standbyReplicas: 0 -> 0
  Normal  UpdateReplicaProfile  14s    ihpa_controller  update ReplicaProfile with onlineReplcas: 1 -> 1, cutoffReplicas: 4 -> 0, standbyReplicas: 0 -> 0

Cleanup

Run following command to cleanup all the resources:

kubectl delete -f gray-strategy-sample.yaml 
kubectl delete -f nginx-statefulset.yaml