Bug report:
- a NotFound error appears every time when import a file from /data01 to /data02/dragonfly(dragonfly rootDir)
# time(dfcache import --content-for-calculating-task-id 7788 /data01/dragonfly/download/qwen-7b-test-DeepSeek-R1-Distill-Qwen-7B-1.tar --console --ttl 15m)
Importing Failed!
*********************************
Bad Code: Internal error
Message: status: NotFound, message: "persistent cache peer {xxxx} not found", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }
real 0m21.191s
user 0m0.004s
sys 0m0.008s
- when check ttl in redis, the peer key's ttl is unusually short, which cause a not found error
127.0.0.1:6379> TTL "scheduler: scheduler-clusters:1:persistent-cache-hosts:{worker-2}:persistent-cache-peers-for-persistent-cache-task"
(integer) 7
127.0.0.1:6379> TTL "scheduler: scheduler-clusters:1:persistent-cache-hosts:{worker-2}:persistent-cache-peers-for-persistent-cache-task"
(integer) 6
127.0.0.1:6379> TTL "scheduler: scheduler-clusters:1:persistent-cache-hosts:1{worker-2}:persistent-cache-peers-for-persistent-cache-task"
(integer) 6
127.0.0.1:6379> TTL "scheduler: scheduler-clusters:1:persistent-cache-hosts:{worker-2}:persistent-cache-peers-for-persistent-cache-task"
(integer) 5
- the ttl is set in
scheduler/resource/persistentcache/peer_manager.go#240
-- Add peer ID to the task joint-set
redis.call("SADD", task_peers_key, peer_id)
redis.call("EXPIRE", task_peers_key, ttl_seconds)
- but the sequence of lua args are transfered incorrectly
local ttl_seconds = tonumber(ARGV[11])
local concurrent_piece_count = ARGV[12]
args := []any{
peer.ID, // ARGV[1]
peer.Persistent, // ARGV[2]
string(finishedPieces), // ARGV[3]
peer.FSM.Current(), // ARGV[4]
string(blockParents), // ARGV[5]
peer.Task.ID, // ARGV[6]
peer.Host.ID, // ARGV[7]
peer.Cost.Nanoseconds(), // ARGV[8]
peer.CreatedAt.Format(time.RFC3339), // ARGV[9]
peer.UpdatedAt.Format(time.RFC3339), // ARGV[10]
peer.ConcurrentPieceCount, // ARGV[11] <-- should be ttl
remainingTTLSeconds, // ARGV[12] <-- should be concurrent_piece_count
}
- so the concurrency(8) is set in ttl, which will cause a NotFound Error
Expected behavior:
lua script args order should fixed
How to reproduce it:
dfcache import {a large file across disk} --console
Environment:
- Dragonfly version: 2.4.3
- OS: linux
- Kernel (e.g.
uname -a):
- Others:
Bug report:
scheduler/resource/persistentcache/peer_manager.go#240Expected behavior:
lua script args order should fixed
How to reproduce it:
Environment:
uname -a):