FrameworkProcessor source_dir wrong behavior

**PySDK Version**
- [ ] PySDK V2 (2.x)
- [x] PySDK V3 (3.x)

**Describe the bug**
It used to be possible to create a pipeline step with FrameworkProcessor, providing a `source_dir` (archive in S3 containing entrypoint Python file as well as other dependencies/code) and `code` (the name of the entrypoint Python file). The `source_dir` S3 uri would then get mapped into `/opt/ml/processing/input/code/`, and a `runproc.sh` script would be created and uploaded, which would then serve as the `ProcessingJob`' entrypoint (install requirements if present, and run the Python file inside `/opt/ml/processing/input/code/`).

This is currently not possible anymore. It should be possible to use an S3 uri for `source_dir`, but it isn't. The code in `_package_code` only makes sense when `source_dir` is a local directory which is then uploaded. We require the previous behavior to be available as well, in order to fully migrate from v2 to v3.

**To reproduce**
Define a FrameworkProcessor and pass an S3 uri to source_dir:

```
script_evaluation = FrameworkProcessor(
    image_uri=self.image_uris["train"],
    command=["python3"],
    instance_type=self.instances["processing_type"],
    instance_count=self.instances["processing_count"],
    base_job_name=base_job_name,
    output_kms_key=self.aws_params["kms_key_hub"],
    volume_kms_key=self.aws_params["kms_key"],
    network_config=self.network_config,
    env={
        "RANDOM_STATE": self.pipeline_params["RandomState"].to_string(),
        **self.default_env_vars,
    },
    role=self.aws_params["exec_role"],
    sagemaker_session=self.pipeline_session,
    tags=self.tags,
)

step_evaluation_args = script_evaluation.run(
    code="evaluate.py", <--- This is the Python entrypoint, but the evaluate.py file imports from other Python files within the source_dir
    source_dir=self.s3_uri_sourcedir, # <--- This is an S3 uri and causes problems
    # add inputs and outputs as needed
    arguments=None,
)
``` 

**Expected behavior**
The same behavior as in v2: `source_dir` should be mapped into `/opt/ml/processing/input/code/`, a `runproc.sh` entrypoint should be created and uplodaded that runs the Python file defined in `code` (which is part of `source_dir`). `source_dir` should not be looked for locally, packaged and uploaded, if `source_dir` is already an S3 uri. 

**System information**
A description of your system. Please provide:
- **SageMaker Python SDK version**: 3.7.1

**Additional context**
This is a roadblock for my company for the migration from v2 to v3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FrameworkProcessor source_dir wrong behavior #5735

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FrameworkProcessor source_dir wrong behavior #5735

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions