This is the multi-page printable view of this section. Click here to print.
User Guide
- 1: IHPA (Intelligent HPA)
- 1.1: Concepts
- 1.1.1: IHPA Architecture Overview
- 1.1.2: Principles of Predictive Scaling
- 2: Algorithm
- 3: Use Custom Metrics
- 4: FAQ
1 - IHPA (Intelligent HPA)
This section contains IHPA’s core concepts and some detailed usages.
If you have no idea of what IHPA is, read the introduction.
If you want to get started to utilize IHPA’s core features quickly, follow the quick start guide.
1.1 - Concepts
1.1.1 - IHPA Architecture Overview
Note: Most contents of this page are translated by AI from Chinese. It would be very appreciated if you could help us improve by editing this page (click the corresponding link on the right side).
Component Architecture
Legend:
- Components of IHPA itself are in blue background
- Kubernetes CRs of IHPA are in green background
- Other Kubernetes resources that IHPA depends on are in yellow background
- Related external systems of IHPA are in white background
Full name of some abbreviations:
- IHPA: IntelligentHorizontalPodAutoscaler
- HPortrait: HorizontalPortrait
- CM: ConfigMap
Component Overview
Replica Controller
Replica Controller is the IHPA execution layer component. It is responsible for the control of the specific workload replica quantity and status. It supports the operations such as scaling, traffic cutoff, and activation of Pods by docking with different native and third-party components.
IHPA Controller
IHPA Controller is the IHPA control plane component. It directly accepts IHPA configurations from users or external systems (including target workload, metrics, algorithms, change and stability configurations, etc.), issues profiling tasks and integrates profiling results, and then performs multi-level batched elasticity scaling based on the profiling results.
HPortrait Controller
HPortrait Controller is the built-in horizontal elasticity scaling algorithm management component. It is responsible for running and managing the workflows of different elasticity scaling algorithms for different workloads, and converting their output results into standard profiling format. The specific algorithm sub-tasks are scheduled to be executed as separate Kubernetes Jobs or tasks on other big data/algorithm platforms. These sub-tasks obtain historical and real-time metric data from external monitoring systems for computation and generation of profiling results.
Specifically, the logic of some simple algorithms (such as reactive algorithms, etc.) is directly implemented in this component, without going through separate algorithm sub-tasks.
Metrics Provider Server
Metrics Provider Server is the unified monitoring metrics query component. It shields the differences of the underlying monitoring system, providing a unified monitoring metrics query service for external running components (such as algorithm tasks, etc.).
The API it provides is similar to Kubernetes Metrics API, but the difference is that it supports both real-time and historical metric queries.
Agent (not included yet)
Agent is the agent component running on the nodes of the Kubernetes cluster. It is mainly responsible for executing operations that require interaction with the underlying operating system, such as activating and de-activating Pods.
1.1.2 - Principles of Predictive Scaling
Note: Most contents of this page are translated by AI from Chinese. It would be very appreciated if you could help us improve by editing this page (click the corresponding link on the right side).
Advantages of Predictive Scaling
By the comparison of above figure, we can conclude several advantages of predictive scaling over reactive scaling:
- Predictive scaling can respond to traffic changes in advance
- Predictive scaling can control resource levels more stably
- Predictive scaling has higher accuracy and can use resources more effectively
Traffic-Driven Replicas Prediction
In this section, we will introduce in detail the design concept and working principle of IHPA’s “traffic-driven replicas prediction” algorithm.
Why Traffic-Driven
For online applications, capacity (resource) indicators (such as CPU utilization) are strongly correlated with traffic, i.e., traffic variations drive changes in capacity indicators. Predicting capacity through traffic, rather than directly predicting capacity indicators, has the following advantages:
- Traffic indicators are the most upstream indicators, which change before capacity indicators, and respond quickly.
- Capacity indicators are easily disturbed by a variety of factors (such as the application’s own code issues, host performance, etc.), while traffic indicators are only directly related to application characteristics (such as user usage habits), making them easier to predict over time.
Modeling the Relationship between Traffic, Capacity and Replicas
In order to convert the replica count prediction problem into a traffic prediction problem, we designed a Linear-Residual Model to find the association function between traffic, capacity, and replica count, as shown in the following figure:
In this model, we set the resource utilization rate as the target indicator, because controlling the resource level of the application is our ultimate goal of using elastic scaling, which is the most intuitive.
However, unlike the reactive scaling algorithm of Kubernetes HPA, although we set the resource utilization rate as the target indicator, this algorithm will not only consider this indicator, but will take historical traffic (supporting multiple lines), historical resource utilization, and historical replica count as inputs. These indicators will first go through a linear model, which can learn the linear association between the three, and get the association function in the above figure; then, they will go through a residual model with other information (currently only includes time information), which will correct the association function after considering other information, and can learn the complex non-linear association between traffic, capacity and replica count.
Here is a simple example to illustrate the main function of the residual model: Suppose an online application executes an internal timed task every Sunday morning, which brings additional CPU resource consumption, but it has no association with the external traffic handled by the application. At this time, this feature cannot be learned through the linear model alone. After introducing the residual model, this model can learn this feature based on time information, so at the time of Sunday morning, we give the same traffic and replica count as other times, and the function it gives will output higher CPU consumption, which is in line with the actual situation.
In the current algorithm implementation, we use ElasticNet as the linear model and LightGBM as the residual model. They are both traditional machine learning algorithms, not strongly dependent on GPU, have lower usage overhead compared to deep learning algorithms, and can also achieve good results. Of course, you can also replace the specific implementation of these models according to your own needs, and you are welcome to provide implementations that you think are superior in certain scenarios.
After using this model to get the association function, we can convert the replica count prediction problem into a traffic prediction problem: knowing the target resource utilization rate, just input the predicted traffic, and you can get the predicted replica count (which can maintain the target average resource utilization rate under the predicted traffic).
Model Details
Denote workflow information (including traffics, resource usage, replicas) as $k$, target indicator (resource usage) as $y$, and meta information (including time, etc.) as $\omega$. We first use a linear model to characterize the skeleton of target indicator: $$\hat y_l = L(x)$$ Then we calculate the error of the linear model: $$e = y - \hat y$$ Next, we combine meta information with the error of the linear model, and mature it by a residual model: $$\hat e = R(\hat y,\omega)$$ Finally, we get the estimation of $y$ as: $$\hat y_r = \hat y_l + \hat e$$ where $L$ and $R$ could be any linear and residual model.
Time Series Forecasting of Traffic
We designed a deep learning model called Swish Net for Time Series Forecasting to predict traffic indicators over time. This model is specifically optimized for the use case of IHPA and has the following two main characteristics:
- Lightweight:The model has a relatively simple structure, which results in a smaller model size and lower training cost. For example, for a single traffic prediction, predicting 12 future points from 12 historical points (which is a 2-hour long prediction at 10-minute accuracy), the trained model size is less than 1 MiB. Under PC-level CPU training, an epoch only takes about 1 minute, and training can be completed in about 1-2 hours.
- Better performance on production traffics forecasting:We compared this model with other common deep learning time series forecasting models using a production traffic dataset. The results show that this model outperforms other models in the task of production traffic time series forecasting, as shown in the following table:
MAE | RMSE | |
---|---|---|
DeepAR | 1.734 | 31.315 |
N-BEATS | 1.851 | 41.681 |
ours | 1.597 | 28.732 |
Model Details
Assuming that the historical traffic $y_{1:T,i}$ is known, the future real traffic is $y_{T+1:T+\tau,i}$, the predicted traffic is $\hat y_{T+1:T+\tau,i}$, and the category of the traffic (such as App) is $i$. The traffic time series will have cyclical, trend, and autoregressive features. We design the following modules to capture these characteristics, and aggregate information to make predictions for the future.
-
The Embedding Layer of the model projects category information and time information into high-dimensional vectors. The category information expresses the differences between different sequences, and the time information can express the cyclicity of the time series: $$V_i = Embed(i)$$ $$V_t = Embed(t)$$
-
The cross product of the model’s time and category features with historical traffic can further extract different differential and cyclical features of different sequences: $$\tilde V_i = V_i \odot y_{1:T}$$ $$\tilde V_t = V_t \odot y_{1:T}$$
-
The difference feature between the next time step and the previous time step of the traffic time series can eliminate the trend and better express the periodicity of the time series. The trend feature is included in the original sequence: $$\tilde y_{1:T} = y_{2:T} - y_{1:T-1}$$
-
The input and network structure of the Multilayer Perception layer model are expressed as follows: $$in = concate(V_i,V_t,\tilde V_i,\tilde V_t,Embed(i),Embed(t),\tilde y_{1:T},y_{1:T})$$ $$\hat y_{T+1:T+\tau,i} = MLP(in)$$ The multilayer time network summarizes the information of the above feature modules and predicts the time series of future time steps.
-
The loss function of the model is MSE: $$loss = \sum_{i,t}(y_{i,t}-\hat y_{i,t})^2$$
Full Algorithm Workflow
Finally, we can link the above two models with the relevant data sources to get the complete workflow of the IHPA predictive autoscaling algorithm, as shown in the following diagram:
Legend:
- The blue line represents the offline workflow, which requires a large amount of data & GPU & a longer execution time, and has a lower execution frequency.
- The yellow line represents the online workflow, which requires a moderate amount of data & CPU & a shorter execution time, and has a higher execution frequency.
Note:
- The “app” here is an abstract concept, representing the smallest unit of autoscaling, typically a specific workload such as Deployment, etc.
- The “all traffics” here does not represent all monitorable traffic in the monitoring system, but represents the component traffic that is significantly positively correlated with the target resource usage indicator, which can be selected as needed according to the specific application scenario.
2 - Algorithm
2.1 - Train Time Series Forecasting Model
Note: Most contents of this page are translated by AI from Chinese. It would be very appreciated if you could help us improve by editing this page (click the corresponding link on the right side).
Before you begin
This document will guide you on how to train the time series prediction deep learning model used by Kapacity.
Before getting started, please make sure that Conda has been installed in your environment.
Install dependencies
Execute the following command to download the code of the Kapacity algorithm version you are using, install algorithm dependencies, and activate the algorithm runtime environment:
git clone --depth 1 -b algorithm-<your-kapacity-algorithm-version> https://github.com/traas-stack/kapacity.git
cd kapacity/algorithm
conda env create -f environment.yml
conda activate kapacity
Prepare configuration file
The training script will read parameters related to dataset fetching and model training from an external configuration file, so we need to prepare this configuration file in advance. You can download this example configuration file tsf-model-train-config.yaml and modify its content as needed. The content of this file is as follows:
targets:
- workloadNamespace: default
workloadRef:
apiVersion: apps/v1
kind: StatefulSet
name: nginx
historyLength: 24H
metrics:
- name: qps
type: Object
object:
metric:
name: nginx_ingress_controller_requests_rate
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: nginx-server
freq: 10min
predictionLength: 3
contextLength: 7
hyperParams:
learningRate: 0.001
epochs: 100
batchSize: 32
Here is an explanation of each field:
targets
: List of workloads to be trained and their metric information (the algorithm supports training multiple metrics of multiple workloads in one model, but it also makes the model larger), if you wish to manually prepare the dataset instead of automatically fetching it by the algorithm, you can leave this field blank.workloadNamespace
: The namespace where the workload to be trained resides.workloadRef
: Reference identification of the workload object to be trained.historyLength
: The length of metrics history to fetch as a dataset, supportingmin
(minutes),H
(hours),D
(days) three time units. Generally, it is recommended to cover at least two complete cycles of periodic metrics.metrics
: List of metrics to be trained for this workload, with the same format as the metrics field of IHPA. Note that you must set a differentname
for each metric to distinguish different metrics under the same workload in the same model.
freq
: The precision of the model, that is, the unit of the parameterspredictionLength
andcontextLength
below. The currently supported values are1min
,10min
,1H
,1D
. Note that this parameter does not affect the model size.predictionLength
: The number of prediction points in the model, and the final prediction length ispredictionLength * freq
. The larger this parameter is set, the larger the model will be. Generally, it is not recommended to set it too large, because the further the prediction time point, the lower the accuracy of the prediction.contextLength
: The number of historical points referenced by the model during prediction (inference), and the final reference history length iscontextLength * freq
. The larger this parameter is set, the larger the model will be.hyperParams
: Hyperparameters of the deep learning model, generally do not need to be adjusted.
Train model
Execute the following command to train the model, note to replace the related parameters with the actual values:
python kapacity/timeseries/forecasting/train.py \
--config-file=<your-config-file> \
--model-save-path=<your-model-save-path> \
--dataset-file=<your-datset-file> \
--metrics-server-addr=<your-metrics-server-addr> \
--dataloader-num-workers=<your-dataloader-num-workers>
Here is an explanation of each parameter:
config-file
: The address of the configuration file prepared in the previous step.model-save-path
: The directory address where the model is to be saved.dataset-file
: The address of the dataset file prepared manually. Andmetrics-server-addr
parameter choose one to fill in.metrics-server-addr
: The address of the metric server for automatically fetching the dataset, that is, the address of the Kapacity gRPC service that can be accessed. Anddataset-file
parameter choose one to fill in.dataloader-num-workers
: The number of subprocesses for loading the dataset, generally recommended to be set to the number of CPUs on the machine, if set to0
, only the main process will be used for loading.
Tips on how to access the Kapacity gRPC service
By default, Kapacity’s gRPC service will be exposed as a ClusterIP Service, you can use the following command to view its ClusterIP and port:
kubectl get svc -n kapacity-system kapacity-grpc-service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kapacity-grpc-service ClusterIP 192.168.38.172 <none> 9090/TCP 5m
If you are running the training script in the cluster, you can directly use this ClusterIP and port as metrics-server-addr
.
If you are not running the training script in the cluster, you can use the following command to access this Service in the cluster from your local machine through kubectl’s port forwarding function, note to replace <local-port>
with a spare port on your local machine:
kubectl port-forward -n kapacity-system svc/kapacity-grpc-service <local-port>:9090
Then, you can use localhost:<local-port>
as metrics-server-addr
.
After execution, you will see training logs similar to the following, please wait until the training is completed (the command exits normally):
2023-10-18 20:05:07,757 - INFO: Epoch: 1 cost time: 55.25944399833679
2023-10-18 20:05:07,774 - INFO: Epoch: 1, Steps: 6 | Train Loss: 188888896.6564227
Validation loss decreased (inf --> 188888896.656423). Saving model ...
2023-10-18 20:05:51,157 - INFO: Epoch: 2 cost time: 43.38192820549011
2023-10-18 20:05:51,158 - INFO: Epoch: 2, Steps: 6 | Train Loss: 212027786.7585510
EarlyStopping counter: 1 out of 15
2023-10-18 20:06:30,055 - INFO: Epoch: 3 cost time: 38.89493203163147
2023-10-18 20:06:30,060 - INFO: Epoch: 3, Steps: 6 | Train Loss: 226666666.7703293
After the training is completed, you can see the trained model and its associated files in the model-save-path
directory:
-rw-r--r-- 1 admin staff 316K Oct 18 20:18 checkpoint.pth
-rw-r--r-- 1 admin staff 287B Oct 18 20:04 estimator_config.yaml
-rw-r--r-- 1 admin staff 29B Oct 18 20:04 item2id.json
3 - Use Custom Metrics
Note: Most contents of this page are translated by AI from Chinese. It would be very appreciated if you could help us improve by editing this page (click the corresponding link on the right side).
Background
The Kubernetes Metrics API provides a set of common metric query interfaces within the K8s system, but it only supports querying the current real-time value of metrics, doesn’t support querying historical values, and also doesn’t support aggregated querying of Pod metrics based on the workload dimension, which cannot meet the data needs of various intelligent algorithms. Therefore, Kapacity has further abstracted and extended the Metrics API, supporting general metric history query and workload dimension aggregation query and other advanced query capabilities while being maximally compatible with user usage habits.
Currently, the Metrics API provided by Kapacity supports the following two metric provider backends:
Use Prometheus as metric provider (default)
Set the startup parameter --metric-provider
of Kapacity Manager to prometheus
to use Prometheus as metric provider.
If you use this provider, you only need a Prometheus, no need to install Kubernetes Metrics Server or other Metrics Adapters (including Prometheus Adapter).
You can find the prometheus-metrics-config.yaml
configuration in the ConfigMap kapacity-config
in the namespace where Kapacity Manager is located. By modifying this configuration, you can fully customize Prometheus query statements for different metric types. The format of this configuration is fully compatible with the configuration of Prometheus Adapter, so if you have previously configured custom HPA metrics with Prometheus Adapter, you can directly reuse the previous configuration.
The default configuration provided by Kapacity is as follows:
resourceRules:
cpu:
containerQuery: |-
sum by (<<.GroupBy>>) (
irate(container_cpu_usage_seconds_total{container!="",container!="POD",<<.LabelMatchers>>}[3m])
)
readyPodsOnlyContainerQuery: |-
sum by (<<.GroupBy>>) (
(kube_pod_status_ready{condition="true"} == 1)
* on (namespace, pod) group_left ()
sum by (namespace, pod) (
irate(container_cpu_usage_seconds_total{container!="",container!="POD",<<.LabelMatchers>>}[3m])
)
)
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
containerLabel: container
memory:
containerQuery: |-
sum by (<<.GroupBy>>) (
container_memory_working_set_bytes{container!="",container!="POD",<<.LabelMatchers>>}
)
readyPodsOnlyContainerQuery: |-
sum by (<<.GroupBy>>) (
(kube_pod_status_ready{condition="true"} == 1)
* on (namespace, pod) group_left ()
sum by (namespace, pod) (
container_memory_working_set_bytes{container!="",container!="POD",<<.LabelMatchers>>}
)
)
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
containerLabel: container
window: 3m
rules: []
externalRules:
- seriesQuery: '{__name__="kube_pod_status_ready"}'
metricsQuery: sum(<<.Series>>{condition="true",<<.LabelMatchers>>})
name:
as: ready_pods_count
resources:
overrides:
namespace:
resource: namespace
workloadPodNamePatterns:
- group: apps
kind: ReplicaSet
pattern: ^%s-[a-z0-9]+$
- group: apps
kind: Deployment
pattern: ^%s-[a-z0-9]+-[a-z0-9]+$
- group: apps
kind: StatefulSet
pattern: ^%s-[0-9]+$
To support advanced query capabilities such as workload dimension aggregation query, we have extended some fields on top of the Prometheus Adapter configuration, and below is a brief explanation of these extended fields:
workloadPodNamePatterns
: Some algorithms of Kapacity will need to query workload dimension metric information, such as the total CPU usage of a workload’s Pods, the number of Ready Pods of a workload, etc. At this time, Kapacity will aggregate queries on Pod dimension metrics by matching the workload Pod name with regular expressions, so it is necessary to configure the regular matching rules of Pod names for different types of workloads through this field. If you use workloads other than the default configuration, you need to add the corresponding configuration in this field.readyPodsOnlyContainerQuery
: Some algorithms of Kapacity have extra conditions when querying the total resource usage of workload Pods, such as only querying the total CPU usage of some workload’s Ready Pods. In this case, we need to provide a separate PQL statement through this field for this special condition query. Kapacity default provides a query statement based on the metrics provided by kube-state-metrics, you can also modify it to other implementations as needed.
Use Kubernetes Metrics API as metric provider (not recommended)
Set the startup parameter --metric-provider
of Kapacity Manager to metrics-api
to use Kubernetes Metrics API as metric provider.
If you use this provider, you need to install Kubernetes Metrics Server or other Metrics Adapters (such as Prometheus Adapter) to shield the differences of the underlying monitoring system.
However, it should be noted that this backend does not support the query of historical values of metrics, nor does it support the aggregation query of Pod metrics based on the workload dimension, so its usable range is very limited, only suitable for some scenarios that only use simple algorithms, such as Reactive Scaling.
4 - FAQ
Note: Most contents of this page are translated by AI from Chinese. It would be very appreciated if you could help us improve by editing this page (click the corresponding link on the right side).
IHPA
Predictive algorithm job failed with error replicas estimation failed
The error is due to the failure of the “Traffic, Capacity and Replicas Relationship Modeling” algorithm to produce a usable model, the following methods can be attempted to solve this issue:
- Try increasing the length of historical data by adjusting the algorithm job parameter
--re-history-len
. - With the detailed model evaluation information returned with the error, try relaxing the model validation requirements by adjusting the algorithm job parameters
--re-min-correlation-allowed
and--re-max-mse-allowed
. However, be aware that if the relaxed values differ too much from the default values, the accuracy of the model may be hard to guarantee.