Skip to content

Agent-scheduler: UpdateSnapshot panic #5167

@yccharles

Description

@yccharles

Description

run Agent-scheduler with config:

--node-worker-threads=5
--scheduler-worker-count=4
--scheduler-name=agent-scheduler

nodeInfo.node is nill lead to panic

snapshot.AddOrUpdateNodes(nodesToUpdate)

Image

Steps to reproduce the issue

  1. 1500+nodes in k8s cluster.
  2. run Agent-scheduler with config:
--node-worker-threads=5
--scheduler-worker-count=4
--scheduler-name=agent-scheduler
  1. submit 1000+ pod need to be scheduler by agent-scheduler
  2. startup Agent-scheduler. panic !

Describe the results you received and expected

maybe processSyncNode execute is too slow. it's still running when worker.runOnce() already begin

https://github.com/volcano-sh/volcano/blob/master/pkg/agentscheduler/cache/cache.go#L772

func (sc *SchedulerCache) runNodeWorker() {
	for sc.processSyncNode() {
	}
}

What version of Volcano are you using?

master

Any other relevant information

No response

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.priority/high

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions