feat(import): resume Hugging Face downloads#1223
Conversation
Signed-off-by: houyuwushang <liuluoqianqiu@outlook.com>
There was a problem hiding this comment.
Pull request overview
Adds resumable downloads for Hugging Face imports (kit import --tool=hf) by reusing a deterministic per-snapshot cache directory and resuming partial files via HTTP Range requests, while keeping the existing “clean import cache on success” behavior.
Changes:
- Reuse deterministic import cache directories (keyed by repo type + repo + pinned commit SHA) to enable resume across runs.
- Implement download resume/skip/restart logic in the HF downloader (Range requests, restart on
200 OKto a Range request, skip fully cached files). - Add unit tests covering deterministic cache dir reuse and the downloader resume/restart/skip behaviors.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pkg/lib/filesystem/cache/cache.go | Allows deterministic cache directories to be created/reopened via MkdirAll. |
| pkg/lib/filesystem/cache/cache_test.go | Adds a test ensuring deterministic cache directories are reusable and preserve partial contents. |
| pkg/lib/external/hf/download.go | Adds resumable download behavior using Range requests and cached file offsets. |
| pkg/lib/external/hf/download_test.go | Adds tests for resume, restart when Range is ignored, and skipping already-cached files. |
| pkg/cmd/kitimport/hfimport.go | Switches HF import temp directory to a deterministic cache dir keyed by pinned snapshot. |
| pkg/cmd/kitimport/hfimport_test.go | Adds a determinism/safety test for the HF import cache key. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if cacheKey != "" { | ||
| cacheDir = filepath.Join(cacheSubDir, cacheKey) | ||
| if err := os.Mkdir(cacheDir, 0700); err != nil { | ||
| if err := os.MkdirAll(cacheDir, 0700); err != nil { | ||
| return "", nil, fmt.Errorf("failed to create cache directory %s: %w", cacheDir, err) | ||
| } | ||
| } else { |
There was a problem hiding this comment.
Good catch. I added cache key validation so deterministic cache keys must be a single relative path element, rejecting absolute paths, path separators, ./.., and cleaned path changes before calling MkdirAll. I also added regression coverage for unsafe cache keys.
Signed-off-by: houyuwushang <liuluoqianqiu@outlook.com>
Description
Adds resumable downloads for Hugging Face imports that use
kit import --tool=hf.This change:
Rangerequests when cached bytes already existRangeand returns200 OKSuccessful imports still clean the import cache through the existing cleanup path.
Linked issues
Fixes #763
AI-Assisted Code
Tests
go test -count=1 ./pkg/lib/external/hf ./pkg/cmd/kitimport ./pkg/lib/filesystem/cachego test -race -count=1 ./pkg/lib/external/hf ./pkg/cmd/kitimport ./pkg/lib/filesystem/cachego test -count=1 ./pkg/...go test ./... -run TestDoesNotExistgo build -o kit.exegit diff --cached --checkgo run github.com/google/addlicense@latest --check -s -l apache -c "The KitOps Authors." pkg/lib/external/hf/download_test.go pkg/lib/filesystem/cache/cache_test.go