Skip to content

feat(import): resume Hugging Face downloads#1223

Open
houyuwushang wants to merge 2 commits into
kitops-ml:mainfrom
houyuwushang:hf-import-resume-downloads
Open

feat(import): resume Hugging Face downloads#1223
houyuwushang wants to merge 2 commits into
kitops-ml:mainfrom
houyuwushang:hf-import-resume-downloads

Conversation

@houyuwushang

@houyuwushang houyuwushang commented Jun 24, 2026

Copy link
Copy Markdown

Description

Adds resumable downloads for Hugging Face imports that use kit import --tool=hf.

This change:

  • reuses a deterministic import cache directory for each pinned Hugging Face snapshot
  • resumes partial files with HTTP Range requests when cached bytes already exist
  • restarts a file download if the server ignores Range and returns 200 OK
  • skips files that are already fully cached
  • lets deterministic cache directories be opened more than once, matching the documented resume behavior

Successful imports still clean the import cache through the existing cleanup path.

Linked issues

Fixes #763

AI-Assisted Code

  • This PR contains AI-generated code that I have reviewed and tested
  • I take full responsibility for all code in this PR, regardless of how it was created

Tests

  • go test -count=1 ./pkg/lib/external/hf ./pkg/cmd/kitimport ./pkg/lib/filesystem/cache
  • go test -race -count=1 ./pkg/lib/external/hf ./pkg/cmd/kitimport ./pkg/lib/filesystem/cache
  • go test -count=1 ./pkg/...
  • go test ./... -run TestDoesNotExist
  • go build -o kit.exe
  • git diff --cached --check
  • go run github.com/google/addlicense@latest --check -s -l apache -c "The KitOps Authors." pkg/lib/external/hf/download_test.go pkg/lib/filesystem/cache/cache_test.go

Signed-off-by: houyuwushang <liuluoqianqiu@outlook.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds resumable downloads for Hugging Face imports (kit import --tool=hf) by reusing a deterministic per-snapshot cache directory and resuming partial files via HTTP Range requests, while keeping the existing “clean import cache on success” behavior.

Changes:

  • Reuse deterministic import cache directories (keyed by repo type + repo + pinned commit SHA) to enable resume across runs.
  • Implement download resume/skip/restart logic in the HF downloader (Range requests, restart on 200 OK to a Range request, skip fully cached files).
  • Add unit tests covering deterministic cache dir reuse and the downloader resume/restart/skip behaviors.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/lib/filesystem/cache/cache.go Allows deterministic cache directories to be created/reopened via MkdirAll.
pkg/lib/filesystem/cache/cache_test.go Adds a test ensuring deterministic cache directories are reusable and preserve partial contents.
pkg/lib/external/hf/download.go Adds resumable download behavior using Range requests and cached file offsets.
pkg/lib/external/hf/download_test.go Adds tests for resume, restart when Range is ignored, and skipping already-cached files.
pkg/cmd/kitimport/hfimport.go Switches HF import temp directory to a deterministic cache dir keyed by pinned snapshot.
pkg/cmd/kitimport/hfimport_test.go Adds a determinism/safety test for the HF import cache key.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 55 to 60
if cacheKey != "" {
cacheDir = filepath.Join(cacheSubDir, cacheKey)
if err := os.Mkdir(cacheDir, 0700); err != nil {
if err := os.MkdirAll(cacheDir, 0700); err != nil {
return "", nil, fmt.Errorf("failed to create cache directory %s: %w", cacheDir, err)
}
} else {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I added cache key validation so deterministic cache keys must be a single relative path element, rejecting absolute paths, path separators, ./.., and cleaned path changes before calling MkdirAll. I also added regression coverage for unsafe cache keys.

Signed-off-by: houyuwushang <liuluoqianqiu@outlook.com>

@gorkem gorkem left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gorkem gorkem requested a review from amisevsk June 25, 2026 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable resumable downloads for kit imports that use the huggingface API

3 participants