Introduction
-
A comprehensive review of the literature on Kubernetes scheduling algorithms targeting four sub-categories: generic scheduling, multi-objective optimization-based scheduling, AI focused scheduling, and autoscaling enabled scheduling.
-
A critical evaluation of the strengths and limitations of existing approaches.
-
Identification of gaps and open questions in the existing literature.
Search methodology
Literature review
-
Objectives
-
Methodology/Algorithms
-
Experiments
-
Findings
-
Applications
-
Limitations
Scheduling in Kubernetes
# | Objectives | Methodology/Algorithms | Experiments | Findings | Applications | Limitations |
---|---|---|---|---|---|---|
[15] | To develop a scheduling strategy for container-based apps in Smart City deployments that is network-aware. | As an addition to the default scheduling mechanism built into Kubernetes, a network-aware scheduling method is suggested and put into practice. | Evaluated utilizing container-based Smart City applications and validated on the Kubernetes platform. | Comparing the suggested method to the default scheduling mechanism, network latency is reduced by 80%. | Can be used in Fog Computing environments for delay-sensitive and data-intensive services. | Further testing and implementation may reveal limitations and future improvements. |
[16] | Public cloud container scheduling with consideration for cost. | A cluster scheduler with a focus on organizing batch job execution on virtual clusters, which is termed as Stratus. | On the basis of cluster workload traces from Google and TwoSigma, simulation experiments were conducted. | Stratus reduces virtual cluster scheduling costs by 17–44% compared to state-of-the-art approaches. | Batch job execution on virtual clusters in the public cloud | Limited to the context of batch job execution on virtual clusters in the public cloud |
[17] | To improve functionality and ensure user equality in a shared cluster with swappable hardware resources for deep learning frameworks. | min-cost bipartite matching | Large-scale simulations and evaluations on a small-scale CPU-GPU hybrid cluster. | AlloX can drastically shorten the average work completion time, eliminate starvation, and ensure fairness. | Scheduling jobs over interchangeable resources in a shared cluster | Interchangeable resources exceeding the threshold of two may cause many problems. |
[18] | Containers’ initial placement is optimized via task packing, enabling cluster size adjustment to meet changing workloads through autoscaling algorithms, and developing a rescheduling mechanism to shut down underutilized VM instances for cost savings while preserving task progress. | Heterogeneous job configurations Autoscaling algorithms Rescheduling mechanism | Validated using the Australian National Cloud Infrastructure (Nectar). | When compared to the standard Kubernetes framework, the suggested solution could lower overall costs for various cloud workload patterns by 23% to 32%. | Container orchestration with low costs on cloud computing infrastructures powered by Kubernetes. | VM types may also be taken into consideration. |
[19] | To develop a GPU-aware resource orchestration layer for datacenters To improve resource utilization and reduce operational costs in datacenters. To improve Quality of Service (QoS) for user-facing queries | Presented Kube-Knots, a Kubernetes-integrated resource orchestration layer that is GPU-aware. Kube-Knots uses dynamic container orchestration to dynamically harvest available computation cycles. Two GPU-based scheduling methods (CBP and PP) are created to schedule workloads at datacenter scale using Kube-Knots. | Evaluated CBP and PP on a ten node GPU cluster Compared results with state-of-the-art schedulers | For HPC applications, CBP and PP increase GPU usage across the cluster by up to 80% on average. Deep learning workloads' average job completion increased by up to 36% and 33% cluster-wide energy reduction For latency-critical queries, PP ensures end-to-end QoS by lowering QoS breaches by up to 53%. | To improve resource utilization and reduce operational costs in GPU-based datacenters | – |
[20] | To improve efficiency of data centers through holistic scheduling in Kubernetes To consider virtual and physical infrastructures and business processes in scheduling | Replaced the Kubernetes default scheduler with a proposed all-encompassing scheduling framework. Both software and hardware model considerations are made by the scheduler. System was deployed in a real data center. | Deployment in real data center | Reductions in power consumption of 10% to 20% were noted The effectiveness of the data center can be significantly increased by an intelligent scheduler. | To improve efficiency of data centers through software-based solutions | Further research is needed in this area |
[21] | A new Kubernetes container scheduling strategy (KCSS) has been introduced. Boost the efficiency of many online-submitted containers' scheduling. | Using a variety of factors, choose the best node for each newly submitted container. To combine all criteria into a single rank, use the Technique for the Order of Prioritization by Similarity to Ideal Solution (TOPSIS) method. | Conducted experiments on different scenarios Used data from cloud infrastructure and user need | When compared to other container scheduling algorithms, KCSS enhances performance. | Can be used in industrial and academic fields for container-orchestration systems | Limited to the six key criteria used in the experiments Potential to expand the criteria and improve the performance further in future work. |
[22] | Presented a Kubernetes GPU scheduling mechanism based on topology. Increase resource efficiency and load distribution in the GPU cluster. | The foundation of the system is the established Kubernetes GPU scheduling mechanism. In a resource access cost tree, the topology of the GPU cluster is restored. The resource access cost tree is used to schedule and adapt various GPU resource application scenarios. | Tencent has employed GaiaGPU in actual production. | Improved resource utilization by about 10% Improved performance on load balance | Used in production at Tencent | – |
[23] | To develop a context-aware Kubernetes scheduler that takes into account physical, operational, and network parameters in order to improve service availability and performance in 5G edge computing | Real-time edge device data integration into the scheduler decision algorithm. | Comparison with the default Kubernetes scheduler | The suggested scheduler offers increased fault tolerance capabilities along with advanced orchestration and management. | 5G edge computing | – |
[24] | To develop a policy-driven meta-scheduler for Kubernetes clusters that enables efficient and fair resource allocation for multiple users | Dominant Resource Fairness (DRF) policy Additional fairness metrics based on task resource demand and average waiting time | – | The proposed meta-scheduler improves fairness in multi-tenant Kubernetes clusters | Kubernetes clusters | – |
[25] | To modify Kubernetes to be better suited for edge infrastructure, with a focus on network latency and self-healing capabilities | Custom Kubernetes scheduler that considers applications' delay constraints and edge reliability | – | The modified Kubernetes is better suited for edge infrastructure | Edge computing | – |
[26] | To improve Kubernetes scheduling for performance-sensitive containerized workflows, particularly in the context of 5G edge applications | NetMARKS is a cutting-edge method for scheduling Kubernetes pods that makes advantage of dynamic network metrics gathered with Istio Service Mesh. | Validated using different workloads and processing layouts | NetMARKS can save up to 50% of inter-node bandwidth while reducing application response times by up to 37%. | Kubernetes in 5G edge computing and machine-to-machine communication | – |
[27] | Create a feedback control approach for Kubernetes-based systems' elastic container provisioning of Web systems. | Combining a linear model with a varying-processing-rate queuing model can increase the accuracy of output errors. | Evaluated on a real Kubernetes cluster | When compared to cutting-edge algorithms, the suggested approach achieves the lowest percentage of SLA violation and the second-lowest cost. | Elastic container provisioning in Kubernetes-based systems | – |
[28] | Create a dynamic Kubernetes scheduler to help a heterogeneous cluster deploy Docker containers more effectively. Utilize past data on container execution to speed up task completion. | Developed the KubCG dynamic scheduling platform. Introduced a new scheduler that takes into account past data on container execution as well as the timetable for Kubernetes Pods. | Conducted different tests to validate the new algorithm | In experiments, KubCG was able to cut task completion times from 100 to 64% of the original time. | Used for the deployment of cloud-based services that require GPUs for tasks like deep learning and video processing. | Further testing and validation are needed to determine the effectiveness of the algorithm in a variety of scenarios. |
[29] | Describe a new method for arranging workloads in a Kubernetes cluster. | Framework model for hybrid shared-state scheduling. On the basis of the cluster's overall state, scheduling decisions are determined. | Tested proposed scheduler behavior under different scenarios, including failover/recovery in a deployed Kubernetes cluster | The suggested scheduler operates in circumstances like priority preemption or collocation interference. The features of both centralized and distributed scheduling frameworks are included in the scheduler. | Used in Kubernetes cluster to optimize resource utilization | Further testing and implementation needed to fully evaluate the effectiveness of the proposed scheduler. |
[30] | Develop and put into use KubeHICE, a container orchestrator for heterogeneous ISA architectures on cloud edge platforms. Assess the efficiency and performance of KubeHICE in handling heterogeneous-ISA clusters. | By using AIM and PAS, KubeHICE expands open source Kubernetes. AIM automatically locates a node that is appropriate for the ISAs that the containerized application supports. PAS schedules containers based on the computational capacity of cluster nodes. | KubeHICE was tested in several real-world scenarios. | KubeHICE is efficient in performance estimation and resource scheduling while adding no further overhead to container orchestration. When handling heterogeneity, KubeHICE can improve CPU utilization by up to 40%. | KubeHICE is beneficial for containerized applications in heterogeneous cloud-edge platforms | – |
[31] | Make the Kubernetes scheduler more efficient by incorporating the disk I/O load. | To improve the disk I/O balance between nodes, a dynamic scheduling approach called Balanced-Disk-IO-Priority (BDI) was proposed. Also presented the Balanced-CPU-Disk-IO-Priority (BCDI) dynamic scheduling algorithm to address the problem of unbalanced CPU and disk I/O load on a single node. | According to experimental findings, the BDI and BCDI algorithms are superior to the default scheduling algorithms in Kubernetes. | The load imbalance of CPU and disk I/O on a single node is resolved by the BDI and BCDI algorithms, which also enhance the disk I/O balance between nodes. | Can be used to improve the performance of Kubernetes in managing containerized applications | Further research may be needed to optimize the BDI and BCDI algorithms and evaluate their performance in different scenarios. |
[32] | Investigate how Serverless frameworks built on Kubernetes systems can schedule pods more efficiently in large-scale concurrent applications. | To further maximize the effectiveness of pod scheduling in Serverless cloud paradigms, a scheduling approach leveraging concurrent scheduling of the same pod is proposed. | Preliminary verification is performed to test the effectiveness of the proposed algorithm. | The suggested approach can significantly cut down on pod startup time while maintaining resource balance on each node. | The proposed algorithm is used to improve efficiency of pod scheduling in Serverless cloud paradigms. | The effectiveness is only verified via preliminary experiments. Also, the algorithm is only applicable to Serverless frameworks. |
[33] | Present a resource rescheduling and Kubernetes scheduler extension that combines QoE metrics into SLOs. | Use the QoE metric proposed in the ITU P.1203 standard Evaluate architecture using video streaming services co-located with other services | Evaluate architecture using video streaming services co-located with other services | The average QoE is increased by 50%. The average QoE was raised by 135% as a result of resource rescheduling. Over-provisioning was completely removed by the suggested architecture. | Improving QoE for cloud environments | Limited to the specific QoE metric used. Further research may be needed to evaluate the effectiveness of the proposed architecture with other QoE metrics. |
[34] | Enable the secure colocation of best-effort processes and latency-sensitive services in Kubernetes clusters to increase resource utilization. Flexibly divide resources among various workload categories. Improve hardware and software isolation capabilities for containers. | Based on Kubernetes extension mechanisms, Zeus was developed. Best-effort jobs are scheduled by Zeus based on actual server use. Through the coordination of hardware and software isolation elements, Zeus improves container isolation. | In a large-scale production setting, Zeus is assessed using latency-sensitive services and best-effort jobs. | Zeus can increase CPU usage from 15 to 60% on average without breaking SLO. Zeus can significantly increase how efficiently Kubernetes clusters use their resources. | Zeus can be used to improve the resource utilization of Kubernetes clusters | – |
Scheduling using multi-objective optimization
# | Objectives | Methodology/Algorithms | Experiments | Findings | Applications | Limitations |
---|---|---|---|---|---|---|
[35] | Present a capable controller for managing containers on edge-cloud nodes in industrial IoT contexts while accounting for interference and energy use. | Integer linear programming based on multi-objective optimization. | Data obtained in real time from the Google compute cluster. | By reducing the energy consumption of edge-cloud nodes and scheduling applications optimally with the least amount of interference, the proposed Kubernetes-based energy and interference driven scheduler (KEIDS) improves performance for end users. | Container management and scheduling for Industrial IoT. | The limitations and future potential of KEIDS are not specified in the given text. |
[36] | Build a multi-objective scheduling model for container-based microservices and to suggest an ant colony method to handle scheduling issues. | Ant colony algorithm | Real data from Alibaba cluster Trace V2018 having an application with 17 micro servers | In comparison to previous relevant algorithms, the suggested optimization method outperformed them in the optimization of cluster service dependability, cluster load balancing, and network transmission overhead. | Container-based microservice scheduling in cloud architectures | High time complexity plus real cloud container should be used. |
[37] | To improve Kubernetes' resource scheduling scheme | The authors examine the source code of Kubernetes' scheduling module, extract its model, and create and carry out a simulation experiment using the model. The K8s scheduling model is then enhanced by combining the ant colony and particle swarm optimization algorithms. | The authors schedule resources for K8s using the Java programming language and the CloudSim tool. | The experimental results demonstrate that the suggested approach outperforms the original scheduling technique, resulting in a lower overall resource cost, a higher maximum node load, and more evenly distributed job assignment. | Kubernetes can deploy containerized applications on a wide scale in private, public, and hybrid cloud environments using the better resource scheduling scheme. | – |
[38] | To describe in more detail the idea of container placement and migration in edge computing, as well as to examine the scheduling models created for this purpose. | The container placement problem can be abstracted using graph network models or multi-objective optimization models. Algorithms based on heuristics to address the scheduling issue. | – | Most existing container scheduling models are heuristic-based and consider only static edge computing tasks, with limited research on decentralized scheduling systems | Container-based edge computing | Future research in container scheduling should focus on decentralized systems and mobile edge nodes. |
[39] | By assigning virtual network functions (VNFs) to appropriate places, virtual networks' resilience can be increased. | Optimization models and heuristic algorithms to solve VNF placement problems | – | Implementation of function scheduler plugins that can connect multiple optimization models with Kubernetes and allocate functions automatically | Allocating VNFs to nodes in Kubernetes | – |
AI focused scheduling
# | Objectives | Methodology/Algorithms | Experiments | Findings | Applications | Limitations |
---|---|---|---|---|---|---|
[46] | To develop an effective deep learning cluster resource scheduler. | Predicts model convergence during training via online fitting and creates performance models to calculate training speed as a function of the resources allotted to each job. Deep learning tasks are placed, and resources are dynamically allocated in order to reduce the amount of time needed to complete each task. | Deployed on a deep learning cluster that runs 9 MXNet training jobs on 7 CPU servers and 6 GPU machines. | Optimus performs about 139% and 63% better than comparable cluster schedulers in terms of job completion time and makespan, respectively. | Can be used in production clusters with deep learning workloads. | Further testing and implementation may reveal limitations and future improvements. |
[47] | Data processing job scheduling on distributed computing clusters. | Utilize neural networks and reinforcement learning to learn workload-specific scheduling algorithms. | Spark integration prototype on a 25 node cluster. | Comparing Decima to hand-tuned scheduling heuristics, the average job completion time is improved by at least 21%. | Scheduling data processing jobs on distributed compute clusters | Open research studies on resource management and computation optimization in edge computing |
[48] | Develop a fair share scheduler for deep learning training on GPU clusters that strikes a balance between the competing demands of efficiency and fairness. | Gandivafair provides performance isolation between users and allocates cluster-wide GPU time fairly among active users. Gandivafair incentivizes users to use older GPUs with a novel resource trading mechanism that maximizes cluster efficiency without affecting fairness guarantees. | Realistic multi-user workloads were used to implement and assess the system in a heterogeneous 200-GPU cluster. | Gandivafair achieves both fairness and efficiency. | Can be used in GPU clusters for deep learning training. | Further testing and implementation may reveal limitations and future improvements. |
[49] | Fully exploit and harness the power of big data, as well as to speed up processing times and enhance Kubernetes cluster performance in general. | Presented a container placement strategy based on progress (ProCon). ProCon takes into account both the current resource usage of the workforce and the projection of future resource demand. ProCon decreases completion time and makespan while balancing resource contentions among clusters. | Extensive experiments conducted to test ProCon | ProCon boosts overall performance by 23.0% and can cut completion times for certain jobs by up to 53.3%. It shows a makespan improvement of up to 37.4% over the Kubernetes scheduler by default. | To improve performance of Kubernetes clusters | – |
[50] | Create a general-purpose, effective deep learning cluster scheduler, and to get the most out of expensive deep learning clusters. | Presented DL2, a scheduler for deep learning clusters that is driven by deep learning. The supervised learning and reinforcement learning approaches are combined in DL2. During the course of DL jobs' training, offline supervised learning is used to warm up the neural network and reinforcement learning is used to fine-tune it. Online resource allocation decisions for jobs are made by DL2 using neural networks. | Kubernetes was used to develop DL2, which allowed for dynamic resource scalability in DL jobs on MXNet. A thorough analysis was done to compare DL2 with the expert heuristic scheduler and the fairness scheduler (DRF) (Optimus). | In terms of average work completion time, DL2 performs better than DRF by 44.1% and Optimus by 17.5%. | To improve resource scheduling in deep learning clusters | – |
[51] | To improve scheduling for deep learning applications in Kubernetes clusters To optimize resource management for deep learning workloads | SpeCon, a unique container scheduler that is tailored for fleeting deep learning applications, was proposed. The foundation of Scheduler is virtualized containers like Kubernetes and Docker. In order to free up resources for quickly expanding models, algorithms keep track of training progress and speculatively move slow-growing models. | Extensive experiments were performed to evaluate the proposed scheduler | SpeCon reduces the time it takes for each job to be completed by up to 41.5%. It also increases makespan by 24.7% and system performance by 14.8%. | To optimize scheduling for deep learning workloads on Kubernetes | – |
[52] | RLSK is a deep reinforcement learning-based job scheduler that can adaptively distribute independent batch processes across many federated cloud computing clusters. | RLSK is based on reinforcement learning and is implemented on Kubernetes | Simulations are conducted to evaluate the performance of RLSK | RLSK outperforms traditional scheduling algorithms | Scheduling independent batch jobs in federated cloud computing clusters. | – |
[53] | Create a machine learning cluster scheduling system and increase its efficiency and precision. | A heuristic scheduling approach that takes an ML job's spatial and temporal characteristics into account. When the system is overloaded, this system load control method removes tasks that produce little to no gain in accuracy and shifts tasks from overloaded servers to underloaded servers depending on task priority. | Large-scale simulations based on actual data and actual experiments | When compared to current ML job schedulers, MLFS decreases JCT by up to 53%, makespan by up to 52%, and improves accuracy by up to 64%. | Job scheduling for large-scale machine learning clusters | – |
[54] | Presented a learning-based scheduling framework for edge-cloud systems focused on Kubernetes (KaiS) that raises the throughput rate of processing requests over the long term. | In order to provide decentralized request dispatch and dynamic dispatch spaces within the edge cluster, KaiS employs a coordinated multi-agent actor-critic method. In order to reduce the orchestration dimensionality by stepwise scheduling, it additionally employs graph neural networks to embed system state information and combines the results with multiple policy networks. | Experiments were conducted using real workload traces. | Regardless of request arrival patterns or system scales, KaiS was successful in learning the proper scheduling policies. In comparison to baselines, it increased system throughput rate by 14.3% and decreased scheduling cost by 34.7%. | KaiS can be used to improve the performance of Kubernetes-oriented edge-cloud systems. | – |
[55] | Create a custom scheduler for the Kubernetes orchestrator that uses a model-based, multi-purpose Multi-Agent System (MAS) platform to divide the scheduling task among the processing nodes. | MAS platform Kubernetes orchestrator | – | The new scheduling approach is faster than the default scheduler of Kubernetes. | Fog-in-the-loop (FIL) applications | – |
[56] | Develop a Kubernetes scheduling plan that blends the LSTM neural network prediction method with the grey system theory. | The method uses the grey system theory and LSTM neural network prediction to optimize the container scheduling algorithm. | The method was tested using experience results. | The algorithm was able to reduce resource fragmentation in the cluster and increase cluster resource utilization. | The method can be used to improve the performance of Kubernetes in managing containers in a cluster. | The limitations of the method are not mentioned in the text. Future work may focus on improving the performance of the algorithm and testing it in more complex scenarios. |
[57] | Create a dynamic resource scheduler in Kubernetes for distributed deep learning training. | Combine the methods of DRAGON and OASIS to create a scheduler that supports weighted autoscaling and gang scheduling for its jobs. | Evaluation using a set of Tensorflow jobs | Scheduler increases training speed by over 26% compared to default Kubernetes scheduler | Used for distributed deep learning training in Kubeflow | – |
[58] | Suggests a more effective scheduling method for deep learning platforms that can accommodate team cooperation. integrate a Docker-based deep learning platform with the improved Kubernetes scheduling algorithm. | Model users of the team as virtual clusters and routinely check the load on the clusters. Use a Docker-based deep learning platform with an improved Kubernetes scheduling algorithm. | The proposed scheduling algorithm is tested using a deep learning platform and multiple teams of users | The proposed algorithm ensures load balance and meets the needs of users | The proposed scheduling algorithm can be used in deep learning platforms to support multi-team collaboration | – |
[59] | Offer a balanced, fine-grained scheduling mechanism for deep learning workloads. | Create a balanced, fine-grained scheduling model that takes into account the DL task's resource consumption characteristics. It is suggested to build a scheduling system named KubFBS using specialized GPU sniffer and balance-aware scheduler modules. | The proposed system is evaluated using real-world DL tasks and the cluster is a 16-node Kubernetes cluster | KubFBS expedites the completion of DL activities and enhances the cluster's capacity for load balancing. | KubFBS can be used to schedule deep learning tasks in a Kubernetes cluster | Further experiments and evaluations are needed to demonstrate the effectiveness of the proposed system in different scenarios. Future work could also focus on improving the scalability of the proposed system. |
Autoscaling-enabled scheduling
# | Objectives | Methodology/Algorithms | Experiments | Findings | Applications | Limitations |
---|---|---|---|---|---|---|
[60] | Create a multi-level dynamic autoscaling technique for containerized apps. | a dynamic multi-level (DM) autoscaling technique employing monitoring information from the infrastructure and applications. | Both simulated and actual workload settings. | The DM method outperforms other autoscaling techniques already in use. | DM method implementation for time-sensitive cloud applications in the SWITCH system. | Limited to the context of container-based cloud applications implemented in the SWITCH system. |
[61] | Create a system that can flexibly change how many containers are operating in a Kubernetes cluster. To include container migration into the Kubernetes VPA system in a non-disruptive manner. | Resource Utilization Based Autoscaling System (RUBAS) | Multiple scientific benchmarks | RUBAS increases the cluster's CPU and memory consumption by 10% and decreases runtime by 15%, with a 5%–20% overhead for each application. | Dynamic allocation of containers running in a Kubernetes cluster | – |
[62] | To make cloud-based applications' management easier and more effective To improve Kubernetes autoscaling decisions by adapting to actual variability of incoming requests | In a short-term assessment loop, various machine learning forecasting techniques compete with one another. Compact management parameter that application providers can use to find the ideal trade-off between resource over-provisioning and SLA violations. | Simulations and measurements on gathered Web traces were used to assess the scaling engine and management parameter. | In comparison to the default baseline, the multi-forecast scaling engine produces much fewer dropped requests with somewhat higher provided resources. | To improve Kubernetes autoscaling decisions in cloud-based applications | – |
[63] | To develop an adaptive autoscaling algorithm for Kubernetes pods To automatically detect optimal resource set for pods and manage horizontal scaling process | Libra automatically determines the best resource combination for a single pod and updates resource description for the pod and the horizontal scaling procedure in the dynamic environment. | – | – | To improve scalability of Kubernetes pods | – |
[64] | To develop an adaptive AI-based autoscaling system for Kubernetes that makes better scaling decisions and is easier for application providers to use | Various AI-based forecast methods Short-term evaluation loop | Simulations Collected web traces | The approach yields noticeably fewer dropped requests and slightly more supplied resources. | Kubernetes | – |
[65] | Improve the efficiency of Kubernetes resource scaling. | Proposed Kubernetes autoscaler based on Pod replicas prediction. | Conducted experiments to verify the proposed autoscaler | Autoscaler had faster response speed | Can be used to improve resource scaling in Kubernetes | Further research and experimentation is to fully evaluate the proposed autoscaler. |
[66] | Present a better automatic scaling plan for Kubernetes based on a variety of node types with pre-loaded images. | The suggested method incorporates the benefits of various node types in the scaling process. | The proposed scheme is tested and compared with the default auto scaler | The suggested approach decreases instability within the active clusters and enhances system performance under high load pressure. | The proposed scheme improved the performance and stability of the system under load pressure | Further testing and evaluation is needed to determine the full range of applications and limitations of the proposed scheme. |
[67] | Address the diversity of 5G use cases with maximum flexibility and cost effectiveness. Improve network functions availability and resilience | Modular design for network functions. Statistical approach for modeling and resolution of resource allocation problem. | Used Kubernetes infrastructure hosting different network services | The suggested technique protects crucial operations while preventing resource limitation in cluster nodes. | Can be applied to Kubernetes infrastructure hosting network services to improve availability and resilience | Limited information provided on experimental setup and specific results obtained |
Discussion, challenges & future suggestions
-
As Kubernetes becomes more popular, there will be a growing need for advanced computation optimization techniques. In the future, Kubernetes may benefit from the development of more sophisticated algorithms for workload scheduling and resource allocation, potentially using AI or machine learning. Additionally, integrating Kubernetes with emerging technologies like serverless computing could lead to even more efficient resource usage by enabling dynamic scaling without pre-provisioned infrastructure. Ultimately, the future of computation optimization in Kubernetes is likely to involve a combination of cutting-edge algorithms, innovative technologies, and ongoing advancements in cloud computing.
-
Testing and implementation to reveal limitations or current learning algorithms for scheduling and potential improvements on large scale clusters. One important focus is on improving the tooling and automation around testing and deployment, including the development of new testing frameworks and the integration of existing tools into the Kubernetes ecosystem. Another key area is the ongoing refinement of Kubernetes' implementation and development process, with a focus on streamlining workflows, improving documentation, and fostering greater collaboration within the open-source community. Additionally, there is a growing emphasis on developing more comprehensive testing and validation strategies for Kubernetes clusters, including the use of advanced techniques like chaos engineering to simulate real-world failure scenarios. Overall, the future of testing and implementation in Kubernetes is likely to involve ongoing innovation, collaboration, and an ongoing commitment to driving the platform forward.A number of methods are employing learning algorithms for resource balancing inside and outside the cluster. Even though the methods given encouraging results, new learning algorithms can be found to improve the scheduler, especially on large scale clusters.
-
Limitations and potential improvements in specific contexts, e.g., Green Computing. Minimizing the carbon footprint of a cluster is an ongoing challenge. Advanced schedulers are needed to be proposed in order to reduce the energy consumption and carbon footprint of clusters in IIoT setups. There is a huge opportunity for improving the existing methods and proposing new methods in this area.
-
Future research in Kubernetes resource management. Kubernetes resource management mostly relies on optimization modelling framework and heuristic-based algorithms. The potential for improving and proposing new resource management algorithms is a very promising area of research. Future research in Kubernetes resource management may focus on addressing the challenges of managing complex, dynamic workloads across distributed, heterogeneous environments. This may involve developing more sophisticated algorithms and techniques for workload placement, resource allocation, and load balancing, as well as exploring new approaches to containerization and virtualization. Additionally, there may be opportunities to leverage emerging technologies like edge computing and 5G networks to enable more efficient and scalable resource management in Kubernetes.
-
Most of the work done in the area of Kubernetes scheduling has been evaluated on small clusters. However, this might not always be tempting. One future research direction in Kubernetes scheduling is to use larger cluster sizes for algorithm evaluation. While Kubernetes has been shown to be effective in managing clusters of up to several thousand nodes, there is a need to evaluate its performance in even larger cluster sizes. This includes evaluating the scalability of the Kubernetes scheduler, identifying potential bottlenecks, and proposing solutions to address them. Additionally, there is a need to evaluate the impact of larger cluster sizes on application performance and resource utilization. This research could lead to the development of more efficient scheduling algorithms and better management strategies for large-scale Kubernetes deployments.
-
Scheduling should not only be considered from the static infrastructure point of view, but rather advanced context-aware scheduling algorithms may be proposed that could focus on developing new approaches to resource allocation and scheduling that take into account a broader range of contextual factors, such as user preferences, application dependencies, and environmental conditions. This may involve exploring new machine learning techniques and optimization algorithms that can dynamically adapt to changing conditions and prioritize resources based on real-time feedback and analysis. Other potential areas of research may include developing new models and frameworks for managing resources in Kubernetes clusters, improving container orchestration and load balancing, and enhancing monitoring and analytics capabilities to enable more effective use of context-aware scheduling algorithms.