This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

User Guide

Learn how to use Kapacity in detail

1 - IHPA (Intelligent HPA)

Learn how to use IHPA in detail

This section contains IHPA’s core concepts and some detailed usages.

If you have no idea of what IHPA is, read the introduction.

If you want to get started to utilize IHPA’s core features quickly, follow the quick start guide.

1.1 - Concepts

Core concepts of IHPA

1.1.1 - IHPA Architecture Overview

Note: Most contents of this page are translated by AI from Chinese. It would be very appreciated if you could help us improve by editing this page (click the corresponding link on the right side).

Component Architecture

Legend:

  • Components of IHPA itself are in blue background
  • Kubernetes CRs of IHPA are in green background
  • Other Kubernetes resources that IHPA depends on are in yellow background
  • Related external systems of IHPA are in white background

Full name of some abbreviations:

  • IHPA: IntelligentHorizontalPodAutoscaler
  • HPortrait: HorizontalPortrait
  • CM: ConfigMap

Component Overview

Replica Controller

Replica Controller is the IHPA execution layer component. It is responsible for the control of the specific workload replica quantity and status. It supports the operations such as scaling, traffic cutoff, and activation of Pods by docking with different native and third-party components.

IHPA Controller

IHPA Controller is the IHPA control plane component. It directly accepts IHPA configurations from users or external systems (including target workload, metrics, algorithms, change and stability configurations, etc.), issues profiling tasks and integrates profiling results, and then performs multi-level batched elasticity scaling based on the profiling results.

HPortrait Controller

HPortrait Controller is the built-in horizontal elasticity scaling algorithm management component. It is responsible for running and managing the workflows of different elasticity scaling algorithms for different workloads, and converting their output results into standard profiling format. The specific algorithm sub-tasks are scheduled to be executed as separate Kubernetes Jobs or tasks on other big data/algorithm platforms. These sub-tasks obtain historical and real-time metric data from external monitoring systems for computation and generation of profiling results.

Specifically, the logic of some simple algorithms (such as reactive algorithms, etc.) is directly implemented in this component, without going through separate algorithm sub-tasks.

Metrics Provider Server

Metrics Provider Server is the unified monitoring metrics query component. It shields the differences of the underlying monitoring system, providing a unified monitoring metrics query service for external running components (such as algorithm tasks, etc.).

The API it provides is similar to Kubernetes Metrics API, but the difference is that it supports both real-time and historical metric queries.

Agent (not included yet)

Agent is the agent component running on the nodes of the Kubernetes cluster. It is mainly responsible for executing operations that require interaction with the underlying operating system, such as activating and de-activating Pods.

1.1.2 - Principles of Predictive Scaling

Note: Most contents of this page are translated by AI from Chinese. It would be very appreciated if you could help us improve by editing this page (click the corresponding link on the right side).

Advantages of Predictive Scaling

By the comparison of above figure, we can conclude several advantages of predictive scaling over reactive scaling:

  • Predictive scaling can respond to traffic changes in advance
  • Predictive scaling can control resource levels more stably
  • Predictive scaling has higher accuracy and can use resources more effectively

Traffic-Driven Replicas Prediction

In this section, we will introduce in detail the design concept and working principle of IHPA’s “traffic-driven replicas prediction” algorithm.

Why Traffic-Driven

For online applications, capacity (resource) indicators (such as CPU utilization) are strongly correlated with traffic, i.e., traffic variations drive changes in capacity indicators. Predicting capacity through traffic, rather than directly predicting capacity indicators, has the following advantages:

  • Traffic indicators are the most upstream indicators, which change before capacity indicators, and respond quickly.
  • Capacity indicators are easily disturbed by a variety of factors (such as the application’s own code issues, host performance, etc.), while traffic indicators are only directly related to application characteristics (such as user usage habits), making them easier to predict over time.

Modeling the Relationship between Traffic, Capacity and Replicas

In order to convert the replica count prediction problem into a traffic prediction problem, we designed a Linear-Residual Model to find the association function between traffic, capacity, and replica count, as shown in the following figure:

In this model, we set the resource utilization rate as the target indicator, because controlling the resource level of the application is our ultimate goal of using elastic scaling, which is the most intuitive.

However, unlike the reactive scaling algorithm of Kubernetes HPA, although we set the resource utilization rate as the target indicator, this algorithm will not only consider this indicator, but will take historical traffic (supporting multiple lines), historical resource utilization, and historical replica count as inputs. These indicators will first go through a linear model, which can learn the linear association between the three, and get the association function in the above figure; then, they will go through a residual model with other information (currently only includes time information), which will correct the association function after considering other information, and can learn the complex non-linear association between traffic, capacity and replica count.

Here is a simple example to illustrate the main function of the residual model: Suppose an online application executes an internal timed task every Sunday morning, which brings additional CPU resource consumption, but it has no association with the external traffic handled by the application. At this time, this feature cannot be learned through the linear model alone. After introducing the residual model, this model can learn this feature based on time information, so at the time of Sunday morning, we give the same traffic and replica count as other times, and the function it gives will output higher CPU consumption, which is in line with the actual situation.

In the current algorithm implementation, we use ElasticNet as the linear model and LightGBM as the residual model. They are both traditional machine learning algorithms, not strongly dependent on GPU, have lower usage overhead compared to deep learning algorithms, and can also achieve good results. Of course, you can also replace the specific implementation of these models according to your own needs, and you are welcome to provide implementations that you think are superior in certain scenarios.

After using this model to get the association function, we can convert the replica count prediction problem into a traffic prediction problem: knowing the target resource utilization rate, just input the predicted traffic, and you can get the predicted replica count (which can maintain the target average resource utilization rate under the predicted traffic).

Model Details

Denote workflow information (including traffics, resource usage, replicas) as $k$, target indicator (resource usage) as $y$, and meta information (including time, etc.) as $\omega$. We first use a linear model to characterize the skeleton of target indicator: $$\hat y_l = L(x)$$ Then we calculate the error of the linear model: $$e = y - \hat y$$ Next, we combine meta information with the error of the linear model, and mature it by a residual model: $$\hat e = R(\hat y,\omega)$$ Finally, we get the estimation of $y$ as: $$\hat y_r = \hat y_l + \hat e$$ where $L$ and $R$ could be any linear and residual model.

Time Series Forecasting of Traffic

We designed a deep learning model called Swish Net for Time Series Forecasting to predict traffic indicators over time. This model is specifically optimized for the use case of IHPA and has the following two main characteristics:

  • Lightweight:The model has a relatively simple structure, which results in a smaller model size and lower training cost. For example, for a single traffic prediction, predicting 12 future points from 12 historical points (which is a 2-hour long prediction at 10-minute accuracy), the trained model size is less than 1 MiB. Under PC-level CPU training, an epoch only takes about 1 minute, and training can be completed in about 1-2 hours.
  • Better performance on production traffics forecasting:We compared this model with other common deep learning time series forecasting models using a production traffic dataset. The results show that this model outperforms other models in the task of production traffic time series forecasting, as shown in the following table:
MAE RMSE
DeepAR 1.734 31.315
N-BEATS 1.851 41.681
ours 1.597 28.732

Model Details

Assuming that the historical traffic $y_{1:T,i}$ is known, the future real traffic is $y_{T+1:T+\tau,i}$, the predicted traffic is $\hat y_{T+1:T+\tau,i}$, and the category of the traffic (such as App) is $i$. The traffic time series will have cyclical, trend, and autoregressive features. We design the following modules to capture these characteristics, and aggregate information to make predictions for the future.

  1. The Embedding Layer of the model projects category information and time information into high-dimensional vectors. The category information expresses the differences between different sequences, and the time information can express the cyclicity of the time series: $$V_i = Embed(i)$$ $$V_t = Embed(t)$$

  2. The cross product of the model’s time and category features with historical traffic can further extract different differential and cyclical features of different sequences: $$\tilde V_i = V_i \odot y_{1:T}$$ $$\tilde V_t = V_t \odot y_{1:T}$$

  3. The difference feature between the next time step and the previous time step of the traffic time series can eliminate the trend and better express the periodicity of the time series. The trend feature is included in the original sequence: $$\tilde y_{1:T} = y_{2:T} - y_{1:T-1}$$

  4. The input and network structure of the Multilayer Perception layer model are expressed as follows: $$in = concate(V_i,V_t,\tilde V_i,\tilde V_t,Embed(i),Embed(t),\tilde y_{1:T},y_{1:T})$$ $$\hat y_{T+1:T+\tau,i} = MLP(in)$$ The multilayer time network summarizes the information of the above feature modules and predicts the time series of future time steps.

  5. The loss function of the model is MSE: $$loss = \sum_{i,t}(y_{i,t}-\hat y_{i,t})^2$$

Full Algorithm Workflow

Finally, we can link the above two models with the relevant data sources to get the complete workflow of the IHPA predictive autoscaling algorithm, as shown in the following diagram:

Legend:

  • The blue line represents the offline workflow, which requires a large amount of data & GPU & a longer execution time, and has a lower execution frequency.
  • The yellow line represents the online workflow, which requires a moderate amount of data & CPU & a shorter execution time, and has a higher execution frequency.

Note:

  • The “app” here is an abstract concept, representing the smallest unit of autoscaling, typically a specific workload such as Deployment, etc.
  • The “all traffics” here does not represent all monitorable traffic in the monitoring system, but represents the component traffic that is significantly positively correlated with the target resource usage indicator, which can be selected as needed according to the specific application scenario.

2 - Algorithm

Learn general algorithms of Kapacity in detail

2.1 - Train Time Series Forecasting Model

Note: Most contents of this page are translated by AI from Chinese. It would be very appreciated if you could help us improve by editing this page (click the corresponding link on the right side).

Before you begin

This document will guide you on how to train the time series prediction deep learning model used by Kapacity.

Before getting started, please make sure that Conda has been installed in your environment.

Install dependencies

Execute the following command to download the code of the Kapacity algorithm version you are using, install algorithm dependencies, and activate the algorithm runtime environment:

git clone --depth 1 -b algorithm-<your-kapacity-algorithm-version> https://github.com/traas-stack/kapacity.git
cd kapacity/algorithm
conda env create -f environment.yml
conda activate kapacity

Prepare configuration file

The training script will read parameters related to dataset fetching and model training from an external configuration file, so we need to prepare this configuration file in advance. You can download this example configuration file tsf-model-train-config.yaml and modify its content as needed. The content of this file is as follows:

targets:
- workloadNamespace: default
  workloadRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: nginx
  historyLength: 24H
  metrics:
  - name: qps
    type: Object
    object:
      metric:
        name: nginx_ingress_controller_requests_rate
      describedObject:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        name: nginx-server
freq: 10min
predictionLength: 3
contextLength: 7
hyperParams:
  learningRate: 0.001
  epochs: 100
  batchSize: 32

Here is an explanation of each field:

  • targets: List of workloads to be trained and their metric information (the algorithm supports training multiple metrics of multiple workloads in one model, but it also makes the model larger), if you wish to manually prepare the dataset instead of automatically fetching it by the algorithm, you can leave this field blank.
    • workloadNamespace: The namespace where the workload to be trained resides.
    • workloadRef: Reference identification of the workload object to be trained.
    • historyLength: The length of metrics history to fetch as a dataset, supporting min (minutes), H (hours), D (days) three time units. Generally, it is recommended to cover at least two complete cycles of periodic metrics.
    • metrics: List of metrics to be trained for this workload, with the same format as the metrics field of IHPA. Note that you must set a different name for each metric to distinguish different metrics under the same workload in the same model.
  • freq: The precision of the model, that is, the unit of the parameters predictionLength and contextLength below. The currently supported values are 1min, 10min, 1H, 1D. Note that this parameter does not affect the model size.
  • predictionLength: The number of prediction points in the model, and the final prediction length is predictionLength * freq. The larger this parameter is set, the larger the model will be. Generally, it is not recommended to set it too large, because the further the prediction time point, the lower the accuracy of the prediction.
  • contextLength: The number of historical points referenced by the model during prediction (inference), and the final reference history length is contextLength * freq. The larger this parameter is set, the larger the model will be.
  • hyperParams: Hyperparameters of the deep learning model, generally do not need to be adjusted.

Train model

Execute the following command to train the model, note to replace the related parameters with the actual values:

python kapacity/timeseries/forecasting/train.py \
  --config-file=<your-config-file> \
  --model-save-path=<your-model-save-path> \
  --dataset-file=<your-datset-file> \
  --metrics-server-addr=<your-metrics-server-addr> \
  --dataloader-num-workers=<your-dataloader-num-workers>

Here is an explanation of each parameter:

  • config-file: The address of the configuration file prepared in the previous step.
  • model-save-path: The directory address where the model is to be saved.
  • dataset-file: The address of the dataset file prepared manually. And metrics-server-addr parameter choose one to fill in.
  • metrics-server-addr: The address of the metric server for automatically fetching the dataset, that is, the address of the Kapacity gRPC service that can be accessed. And dataset-file parameter choose one to fill in.
  • dataloader-num-workers: The number of subprocesses for loading the dataset, generally recommended to be set to the number of CPUs on the machine, if set to 0, only the main process will be used for loading.

After execution, you will see training logs similar to the following, please wait until the training is completed (the command exits normally):

2023-10-18 20:05:07,757 - INFO: Epoch: 1 cost time: 55.25944399833679
2023-10-18 20:05:07,774 - INFO: Epoch: 1, Steps: 6 | Train Loss: 188888896.6564227
Validation loss decreased (inf --> 188888896.656423).  Saving model ...
2023-10-18 20:05:51,157 - INFO: Epoch: 2 cost time: 43.38192820549011
2023-10-18 20:05:51,158 - INFO: Epoch: 2, Steps: 6 | Train Loss: 212027786.7585510
EarlyStopping counter: 1 out of 15
2023-10-18 20:06:30,055 - INFO: Epoch: 3 cost time: 38.89493203163147
2023-10-18 20:06:30,060 - INFO: Epoch: 3, Steps: 6 | Train Loss: 226666666.7703293

After the training is completed, you can see the trained model and its associated files in the model-save-path directory:

-rw-r--r-- 1 admin  staff   316K Oct 18 20:18 checkpoint.pth
-rw-r--r-- 1 admin  staff   287B Oct 18 20:04 estimator_config.yaml
-rw-r--r-- 1 admin  staff    29B Oct 18 20:04 item2id.json

3 - Use Custom Metrics

Note: Most contents of this page are translated by AI from Chinese. It would be very appreciated if you could help us improve by editing this page (click the corresponding link on the right side).

Background

The Kubernetes Metrics API provides a set of common metric query interfaces within the K8s system, but it only supports querying the current real-time value of metrics, doesn’t support querying historical values, and also doesn’t support aggregated querying of Pod metrics based on the workload dimension, which cannot meet the data needs of various intelligent algorithms. Therefore, Kapacity has further abstracted and extended the Metrics API, supporting general metric history query and workload dimension aggregation query and other advanced query capabilities while being maximally compatible with user usage habits.

Currently, the Metrics API provided by Kapacity supports the following two metric provider backends:

Use Prometheus as metric provider (default)

Set the startup parameter --metric-provider of Kapacity Manager to prometheus to use Prometheus as metric provider.

If you use this provider, you only need a Prometheus, no need to install Kubernetes Metrics Server or other Metrics Adapters (including Prometheus Adapter).

You can find the prometheus-metrics-config.yaml configuration in the ConfigMap kapacity-config in the namespace where Kapacity Manager is located. By modifying this configuration, you can fully customize Prometheus query statements for different metric types. The format of this configuration is fully compatible with the configuration of Prometheus Adapter, so if you have previously configured custom HPA metrics with Prometheus Adapter, you can directly reuse the previous configuration.

The default configuration provided by Kapacity is as follows:

resourceRules:
  cpu:
    containerQuery: |-
      sum by (<<.GroupBy>>) (
        irate(container_cpu_usage_seconds_total{container!="",container!="POD",<<.LabelMatchers>>}[3m])
      )      
    readyPodsOnlyContainerQuery: |-
      sum by (<<.GroupBy>>) (
          (kube_pod_status_ready{condition="true"} == 1)
        * on (namespace, pod) group_left ()
          sum by (namespace, pod) (
            irate(container_cpu_usage_seconds_total{container!="",container!="POD",<<.LabelMatchers>>}[3m])
          )
      )      
    resources:
      overrides:
        namespace:
          resource: namespace
        pod:
          resource: pod
    containerLabel: container
  memory:
    containerQuery: |-
      sum by (<<.GroupBy>>) (
        container_memory_working_set_bytes{container!="",container!="POD",<<.LabelMatchers>>}
      )      
    readyPodsOnlyContainerQuery: |-
      sum by (<<.GroupBy>>) (
          (kube_pod_status_ready{condition="true"} == 1)
        * on (namespace, pod) group_left ()
          sum by (namespace, pod) (
            container_memory_working_set_bytes{container!="",container!="POD",<<.LabelMatchers>>}
          )
      )      
    resources:
      overrides:
        namespace:
          resource: namespace
        pod:
          resource: pod
    containerLabel: container
  window: 3m
rules: []
externalRules:
- seriesQuery: '{__name__="kube_pod_status_ready"}'
  metricsQuery: sum(<<.Series>>{condition="true",<<.LabelMatchers>>})
  name:
    as: ready_pods_count
  resources:
    overrides:
      namespace:
        resource: namespace
workloadPodNamePatterns:
- group: apps
  kind: ReplicaSet
  pattern: ^%s-[a-z0-9]+$
- group: apps
  kind: Deployment
  pattern: ^%s-[a-z0-9]+-[a-z0-9]+$
- group: apps
  kind: StatefulSet
  pattern: ^%s-[0-9]+$

To support advanced query capabilities such as workload dimension aggregation query, we have extended some fields on top of the Prometheus Adapter configuration, and below is a brief explanation of these extended fields:

  • workloadPodNamePatterns: Some algorithms of Kapacity will need to query workload dimension metric information, such as the total CPU usage of a workload’s Pods, the number of Ready Pods of a workload, etc. At this time, Kapacity will aggregate queries on Pod dimension metrics by matching the workload Pod name with regular expressions, so it is necessary to configure the regular matching rules of Pod names for different types of workloads through this field. If you use workloads other than the default configuration, you need to add the corresponding configuration in this field.
  • readyPodsOnlyContainerQuery: Some algorithms of Kapacity have extra conditions when querying the total resource usage of workload Pods, such as only querying the total CPU usage of some workload’s Ready Pods. In this case, we need to provide a separate PQL statement through this field for this special condition query. Kapacity default provides a query statement based on the metrics provided by kube-state-metrics, you can also modify it to other implementations as needed.

Set the startup parameter --metric-provider of Kapacity Manager to metrics-api to use Kubernetes Metrics API as metric provider.

If you use this provider, you need to install Kubernetes Metrics Server or other Metrics Adapters (such as Prometheus Adapter) to shield the differences of the underlying monitoring system.

However, it should be noted that this backend does not support the query of historical values of metrics, nor does it support the aggregation query of Pod metrics based on the workload dimension, so its usable range is very limited, only suitable for some scenarios that only use simple algorithms, such as Reactive Scaling.

4 - FAQ

Note: Most contents of this page are translated by AI from Chinese. It would be very appreciated if you could help us improve by editing this page (click the corresponding link on the right side).

IHPA

Predictive algorithm job failed with error replicas estimation failed

The error is due to the failure of the “Traffic, Capacity and Replicas Relationship Modeling” algorithm to produce a usable model, the following methods can be attempted to solve this issue:

  • Try increasing the length of historical data by adjusting the algorithm job parameter --re-history-len.
  • With the detailed model evaluation information returned with the error, try relaxing the model validation requirements by adjusting the algorithm job parameters --re-min-correlation-allowed and --re-max-mse-allowed. However, be aware that if the relaxed values differ too much from the default values, the accuracy of the model may be hard to guarantee.