Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
23 changes: 23 additions & 0 deletions docs-site/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Dependencies
/node_modules

# Production
/build

# Generated files
.docusaurus
.cache-loader

# Misc
.DS_Store
.env.local
.env.development.local
.env.test.local
.env.production.local

npm-debug.log*
yarn-debug.log*
yarn-error.log*



35 changes: 35 additions & 0 deletions docs-site/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Volcano Website Migration (Hugo → Docusaurus)

This repository contains my ongoing migration work for the Volcano website as part of my LFX application.

🔗 Live Preview (Vercel)
https://volcano-docusaurus-migration.vercel.app

## What has been migrated so far

### Docusaurus Setup
- Full Docusaurus site initialized
- Navbar + footer recreated
- Tried to match the original Hugo site's font and styling

### Documentation
- Versioning system configured
- Latest version fully migrated (v1.13)
- Docs structure recreated
- All images added and linked

### Blog
- Blog section migrated
- Blog cards & styling matched with original Hugo site
- All images added and linked

### Features implemented
- Local search plugin integrated
- Dark mode styling aligned

## Work in Progress
- Remaining docs versions migration
- Chinese i18n fixes
- Broken links cleanup

This repo is a **migration preview**, not final production code.
36 changes: 36 additions & 0 deletions docs-site/blog/1.4 release-en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
title: "Volcano v1.4 (Beta) Release Note"
description: "Volcano v1.4 (Beta) Release Includes New Features Such as NUMA-Aware"
authors: ["volcano"]
date: 2021-09-13
tags: ["release", "volcano", "cncf"]
---

>This article was firstly released at `Container Cube` on September 6th, 2021, refer to [Volcano v1.4.0-Beta发布,支持NUMA-Aware等多个重要特性](https://mp.weixin.qq.com/s/S5JAQI0uLoTEx0lvYDXM4Q)


Volcano, CNCF's first batch computing project, is now available with a new version, v1.4 (Beta). This version includes multiple important features, such as resource ratio-based partitions on GPU nodes, NUMA-aware, mixed deployment of multiple schedulers, and greatly improved stability.
<!-- truncate -->

__Resource ratio-based partitions on GPU nodes__ is developed to avoid idle GPUs while GPU-consuming jobs are starving. This is an important feature contributed by Leinao Cloud, a Volcano community member.

Previously, a scheduler had separate rules for allocating scarce resources such as GPUs and common resources such as CPUs. That is, CPU-consuming jobs can be directly allocated to GPU nodes to consume CPU and memory resources without considering the upcoming GPU jobs and reserving no resources for them. Alternatively, an independent scheduler was configured for GPU nodes, which did not allow CPU-consuming jobs to be scheduled to GPU nodes.

Now with resource ratio-based partitions, you can set a dominant resource (usually GPU) and configure a resource ratio (for example, GPU:CPU:Memory = 1:4:32) for the dominant resource. The scheduler ensures that the ratio of idle GPU, CPU, and memory resources on a GPU node is greater than or equal to the value you set.

In this way, GPU-consuming jobs that meet the ratio requirement can be scheduled to the node at any time, preventing GPU wastes. Compared with other solutions in the industry, this more flexible method improves node resource utilization.

For details about the feature design and usage, you can visit https://github.com/volcano-sh/volcano/blob/master/docs/design/proportional.md.


__CPU NUMA-aware__ is another important feature of this version. For computing-intensive jobs such as AI and big data jobs, enabling NUMA will significantly improve the computing efficiency. With CPU NUMA-aware scheduling, you can configure the NUMA policy to determine whether to enable NUMA for workloads. The scheduler will select a node that meets the NUMA requirements.

For details about the feature design and usage, you can visit https://github.com/volcano-sh/volcano/blob/master/docs/design/numa-aware.md.

You can now __deploy different types of schedulers__ in a Kubernetes cluster to properly schedule resources. The most common use case is deploying default-scheduler and Volcano together. Native Kubernetes resource objects, such as Deployments and StatefulSets, can be scheduled by default-scheduler, and high-performance computing workloads, such as Volcano Jobs, TensorFlow Jobs, and Spark Jobs, can be scheduled by Volcano. This solution can make the best possible use of each type of schedulers and reduce the concurrency pressure of a single scheduler.

For details about the feature design and usage, you can visit https://github.com/volcano-sh/volcano/blob/master/docs/design/multi-scheduler.md.

In addition to the preceding features, Volcano v1.4 (Beta) adds the stress testing automation framework and fixes bugs introduced by the resource comparison function robustness.

The community is collecting roadmap features for Volcano v1.5. We have received requirements on support for cluster resource monitoring, hierarchical queues, enhanced Spark integration, and task dependency. Every piece of your suggestions and issues is welcome.
110 changes: 110 additions & 0 deletions docs-site/blog/ING_case-en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
title: "ING Bank: How Volcano Empowers Their Big Data Analytics Platform"
description: "ING Bank: How Volcano Empowers Their Big Data Analytics Platform"
authors: ["volcano"]
date: 2022-12-28
tags: ["case-study", "big-data", "analytics", "banking"]
---

>On October 26, 2022, Krzysztof Adamski and Tinco Boekestijn from ING Group delivered a keynote speech "Efficient Scheduling Of High Performance Batch Computing For Analytics Workloads With Volcano" at KubeCon North America. The speech focused on how Volcano, a cloud native batch computing project, supports high-performance scheduling for big data analytics jobs on ING's data management platform.
More details: [KubeCon + CloudNativeCon North America](https://events.linuxfoundation.org/archive/2022/kubecon-cloudnativecon-north-america/program/schedule/)

<!-- truncate -->
## Introduction to ING

Internationale Nederlanden Groep (ING), a global financial institution of Dutch origin, was created in 1991 with the merger of Dutch insurer Nationale-Nederlanden and national postal bank NMB Postbank.

ING provides services in more than 40 countries around the world. Core businesses are banking, insurance, and asset management. Their 56,000 employees serve 53.2 million customers worldwide, including natural persons, families, businesses, governments, and organizations such as IMF.


## Business Background

Regulations and restrictions on banking vary depending on the country/region. Data silos, data security, and compliance requirements can be really challenging. It is not easy to introduce new technologies. Therefore, ING builds their Data Analytics Platform (DAP) to provide secure, self-service functionality for employees to manage services throughout the entire process.

{/**/}

In 2013, they conceptualized data platform. In 2018, ING introduced cloud native technologies to upgrade their infrastructure platform. Since then, more and more employees and departments turn to the platform, and by now, there are more than 400 projects on the data index platform.

They aim to meet all analytics needs in a highly secure, self-service platform that has the following features:
- Open source tool model
- Powerful computing
- Strict security and compliance measures
- One platform for all
- Both global and local


## Challenges and Solutions
{/**/}

ING is shifting from Hadoop to Kubernetes. They met some challenges in job management and multi-framework support. For example:

- Job management
- Pod scheduling: Unaware of upper-layer applications.
- Lack of fine-grained lifecycle management
- Lack of dependencies of tasks and jobs
- Scheduling
- Lack of job-based scheduling, such as sorting, priority, preemption, fair scheduling, and resource reservation
- No advanced scheduling algorithms, such as those based on CPU topology, task topology, IO-awareness, and backfilling
- Lack of resource sharing among jobs, queues, and namespaces
- Multi-framework support
- Insufficient support for frameworks such as TensorFlow and PyTorch
- Complex management of each framework (such as resource planning and sharing)

Managing applications (stateless and even stateful ones) with Kubernetes would be a perfect choice, if Kubernetes is as user-friendly as Yarn in the scheduling and management of batch computing jobs. Yarn also provides limited support, for example, on TensorFlow and PyTorch. Therefore, ING looked for better solutions.

__Kubernetes + Hadoop__
{/**/}
When managing clusters, ING once separated Hadoop and Kubernetes. They ran almost all Spark jobs in Hadoop clusters, and other tasks and algorithms in Kubernetes clusters. They want to run all the jobs in Kubernetes clusters to simplify management.

{/**/}
When Kubernetes and Yarn work together, Kubernetes and Hadoop resources are statically divided. During office hours, Hadoop applications and Kubernetes use their own resources. Spark tasks, when heavily pressured, cannot be allocated extra resources. At night, there are only batch processing tasks in clusters. All Kubernetes resources are idle but cannot be allocated to Hadoop. In this case, resources are not fully used.


__Kubernetes with Volcano__
{/**/}
When managing clusters with Kubernetes and scheduling Spark tasks with Volcano, resources do not need to be statically divided. Cluster resources can be dynamically re-allocated based on the priorities and resource pressure of pods, batch tasks, and interactive tasks, which greatly improves the overall utilization of cluster resources.

For example, during office hours, idle resources of common service applications can be used by batch and interactive applications temporarily. In holidays or nights, batch applications can use all cluster resources for data computing.

{/**/}
Volcano is a batch scheduling engine developed for Kubernetes with the following capabilities:

- Job queues with weighted priority
- Able to commit above queue limits if the cluster has spare capacity
- Able to preempt pods when more pods come in
- Configurable strategies to deal with competing workloads
- Compatible with Yarn scheduling

Volcano supplements Kubernetes in batch scheduling. Since Apache Spark 3.3, Volcano has become the default batch scheduler of Spark on Kubernetes, making it easier to install and use.

## Highlighted Features
__Redundancy and Local Affinity__
{/**/}
Volcano retains the affinity and anti-affinity policies for pods in Kubernetes, and adds those for tasks.

{/**/}
The idea of DRF is that in a multi-resource environment, resource allocation should be determined by the dominant share of an entity (user or queue). The volcano-scheduler observes the dominant resource requested by each job and uses it as a measure of cluster resource usage. Based on this dominant resource, the volcano-scheduler calculates the share of the job. The job with a lower share has a higher scheduling priority.

For example, a cluster has 18 CPUs and 72 GB memory in total. User1 and User2 are each allocated one queue. Any submitted job will get its scheduling priority based on the dominant resource.

- For User1, the CPU share is 0.33 (6/18), the memory share is 0.33 (24/72), and the final share is 0.33.
- For User2, the CPU share is 0.67 (12/18), the memory share is 0.33 (24/72), and the final share is 0.67.

Under a DRF policy, the job with a lower share will be first scheduled, that is, the job committed by User1.

Queue resources in a cluster can be divided by configuring weights. However, overcommitted tasks in a queue can use the idle resources in other queues. In this example, after using up the CPUs of its own queue, User2 can use the idle CPUs of User1. When User1 commits a new task, it triggers resource preemption and reclaims the resources occupied by other queues.

__Resource Reservation__
{/**/}
Batch computing tasks and other services may preempt resources and cause conflicts. Assume there are two available nodes in a cluster and we need to deploy a unified service layer in the cluster to provide services externally, such as Presto or cache services like Alluxio, batch computing tasks may have already taken all resources and we can't deploy or upgrade that service layer. Therefore, ING's platform now allows users to reserve some resources for other services.

__DRF Dashboard__
{/**/}
ING built a DRF scheduling dashboard based on the monitoring data from Volcano to obtain scheduling data at different layers. In the service cluster, ING stores the tasks of interactive users in one queue, and the computing tasks of all key projects running on the data platform in another queue. ING can take certain resources from other queues to the key project queue, but that won't do any good to the tasks of interactive users.

ING is considering displaying the peak hours of cluster use to provide users with more information. With this, users can decide when to start their tasks based on the cluster resource readiness, improving computing performance without complex configurations in the background.
{/**/}

## Summary
Volcano abstracts batch task scheduling, allowing Kubernetes to better serve ING in task scheduling. ING will contribute their developed functions to the community, such as the DRF dashboard, idle resource reservation on each node, auto queue management, new Prometheus monitoring metrics, Grafana dashboard updates, kube-state-metrics update, and cluster role restrictions.

Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: "Meet Cloud Native Batch Computing with Volcano in AI & Big Data Scenarios"
description: "Join Volcano at KubeCon + CloudNativeCon Europe, 19-22 March in Paris!"
authors: ["volcano"]
date: 2024-03-08
tags: ["kubecon", "batch-computing", "ai", "big-data", "volcano"]
---

Cloud native batch computing engine Volcano is designed for high-performance computing applications such as AI, big data, gene sequencing, and rendering, and supports mainstream general computing frameworks. More than 58,000 global developers joined us, among whom the in-house ones come from companies such as Huawei, AWS, Baidu, Tencent, JD, and Xiaohongshu. There are 3.7k+ Stars and 800+ Forks for the project. Volcano has been proven feasible for mass data computing and analytics, such as AI, big data, and gene sequencing. Supported frameworks include Spark, Flink, TensorFlow, PyTorch, Argo, MindSpore, Paddlepaddle, Kubeflow, MPI, Horovod, MXNet, KubeGene, and Ray. The ecosystem is thriving with more developers and use cases coming up.
![](/img/blog/volcano_logo.svg)

<!-- truncate -->
As the industry-first cloud native batch computing project,Volcano was Open-sourced at KubeCon Shanghai in June 2019, it became an official CNCF project in April 2020. In April 2022, Volcano was promoted to a CNCF incubating project. By now, more than 600 global developers have committed code to the project. The community is seeing growing popularity among developers, partners, and users.

### Try new features in Volcano v1.8.2

In Volcano’s Latest release v1.8.2 ,the following new features are added :

- **Support for vGPU scheduling and isolation**

- **Support for vGPU and user-defined resource preemption capabilities**

- **Addition of JobFlow workflow scheduling engine**

- **Node load-aware scheduling and rescheduling support for diverse monitoring systems**

- **Optimization of Volcano’s ability to schedule microservices**

- **Optimization of Volcano charts packages for publishing and archiving**

Try Volcano v1.8.2:https://github.com/volcano-sh/volcano/releases/tag/v1.8.2


### Join Volcano Community Co-construction Program
Recently,More than 50 cases related to Volcano have been implemented. These cases are widely distributed in industries such as Internet, advanced manufacturing, finance, life sciences, scientific research, autonomous driving, and medicine. They cover massive data computing and analysis scenarios like AI, big data, genomic sequencing, and rendering. The main users are Tencent, Amazon, ING Bank, Baidu, Xiaohongshu, DiDi, 360, iQIYI, Leinao, Pengcheng Laboratory, Cruise, Li Auto, Unisound, Ximalaya, Vipshop, GrandOmics, BOSS Zhipin, and so on. With the expansion of the Volcano ecosystems, more and more users are highly willing to join the community.

The Volcano community launched the co-construction program to welcome users into the Volcano community, to accelerate cloud native progress, and to ensure a diverse Volcano ecosystem.

Through this program, you will have opportunities for technological guidance, promotion, as well as online and offline technological sharing. If your company or organization recognizes the value that Volcano has to offer, wants help using Volcano, or wants to exert their technological influence, consider joining the program.
For details about the requirements and benefits, see https://github.com/volcano-sh/community/blob/master/community-building-program.md


### Join Volcano at KubeCon + CloudNativeCon Europe, 19-22 March in Paris!
![](/img/blog/2024-paris.png)
Volcano will participate in several activities, including:

- Speech Schedule
- March 19 at 14:05 - 14:30 am CET:Level 7.3 | Room S03
Volcano Maintainer Kevin Wang, Huawei, presents“Efficient Multi-Cluster GPU Workload Management with Karmada and Volcano”
- March 22 at 11:55 - 12:30 am CET:Pavilion 7 | Level 7.3 | N03
Volcano Maintainer William Wang, Huawei & Mengxuan Li, 4paradigm presents “Cloud Native Batch Computing with Volcano: Updates and Future ”
- March 22 at 16:00 - 16:35 am CET:Pavilion 7 | Level 7.3 | Paris Room
Volcano Maintainer William Wang & Hongcai Ren, Huawei presents “Maximizing GPU Utilization Over Multi-Cluster: Challenges and Solutions for Cloud-Native AI Platform”
- Booth Hours:
- March 20-22 PM(W, Th, F) :Stop by CNCF Project Pavilion Booth PP18-B at KubeCon +CloudNativeCon Europe to speak with an expert or see a demo!
Loading