Skip to content

Commit f4b9a91

Browse files
authored
Catalog Migrator: Remove AWS SDK dependencies for runtime (#133)
Runtime jar was 650 MB because of this. Now it is around 118 MB (still packs hadoop-aws). Initially wanted to have it has standalone jar, but we were only depending on AWS SDK. We need other GCP or AZURE dependencies if we want to be standalone jar for all systems. Thats will bloat up the runtime jar size. Hence, excluded the dependencies and user need to provide them in the classpath based on the storage type (similar to runtime jars of Iceberg)
1 parent eafaff8 commit f4b9a91

5 files changed

Lines changed: 47 additions & 60 deletions

File tree

iceberg-catalog-migrator/cli/BUNDLE-LICENSE

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1244,13 +1244,6 @@ License: Apache License, Version 2.0 - http://www.apache.org/licenses/LICENSE-2.
12441244

12451245
--------------------------------------------------------------------------------
12461246

1247-
This artifact bundles Amazon AWS SDK.
1248-
1249-
Project URL: https://aws.amazon.com/sdkforjava
1250-
License: Apache License, Version 2.0 - http://www.apache.org/licenses/LICENSE-2.0.txt
1251-
1252-
--------------------------------------------------------------------------------
1253-
12541247
This artifact bundles Stax API.
12551248

12561249
Project URL: http://stax.codehaus.org/

iceberg-catalog-migrator/cli/BUNDLE-NOTICE

Lines changed: 0 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -133,33 +133,4 @@ This artifact bundles Project Nessie with the following in its NOTICE:
133133
| Nessie
134134
| Copyright 2015-2025 Dremio Corporation
135135

136-
-------------------------------------------------------------------------
137-
138-
This artifact bundles Amazon AWS SDK with the following in its NOTICE:
139-
| AWS SDK for Java 2.0
140-
| Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
141-
|
142-
| This product includes software developed by
143-
| Amazon Technologies, Inc (http://www.amazon.com/).
144-
|
145-
| **********************
146-
| THIRD PARTY COMPONENTS
147-
| **********************
148-
| This software includes third party software subject to the following copyrights:
149-
| - XML parsing and utility functions from JetS3t - Copyright 2006-2009 James Murty.
150-
| - PKCS#1 PEM encoded private key parsing and utility functions from oauth.googlecode.com - Copyright 1998-2010 AOL Inc.
151-
| - Apache Commons Lang - https://github.com/apache/commons-lang
152-
| - Netty Reactive Streams - https://github.com/playframework/netty-reactive-streams
153-
| - Jackson-core - https://github.com/FasterXML/jackson-core
154-
| - Jackson-dataformat-cbor - https://github.com/FasterXML/jackson-dataformats-binary
155-
|
156-
| The licenses for these third party components are included in LICENSE.txt
157-
|
158-
| - For Apache Commons Lang see also this required NOTICE:
159-
| Apache Commons Lang
160-
| Copyright 2001-2020 The Apache Software Foundation
161-
|
162-
| This product includes software developed at
163-
| The Apache Software Foundation (https://www.apache.org/).
164-
165136
-------------------------------------------------------------------------

iceberg-catalog-migrator/cli/build.gradle.kts

Lines changed: 1 addition & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -47,17 +47,7 @@ dependencies {
4747
implementation("org.apache.iceberg:iceberg-hive-metastore")
4848
implementation("org.apache.iceberg:iceberg-nessie")
4949
implementation("org.apache.iceberg:iceberg-dell")
50-
implementation(libs.hadoop.aws) { exclude("com.amazonaws", "aws-java-sdk-bundle") }
51-
// AWS dependencies based on https://iceberg.apache.org/docs/latest/aws/#enabling-aws-integration
52-
runtimeOnly(libs.aws.sdk.apache.client)
53-
runtimeOnly(libs.aws.sdk.auth)
54-
runtimeOnly(libs.aws.sdk.glue)
55-
runtimeOnly(libs.aws.sdk.s3)
56-
runtimeOnly(libs.aws.sdk.dynamo)
57-
runtimeOnly(libs.aws.sdk.kms)
58-
runtimeOnly(libs.aws.sdk.lakeformation)
59-
runtimeOnly(libs.aws.sdk.sts)
60-
runtimeOnly(libs.aws.sdk.url.connection.client)
50+
implementation(libs.hadoop.aws) { exclude(group = "software.amazon.awssdk") }
6151

6252
// needed for Hive catalog
6353
runtimeOnly("org.apache.hive:hive-metastore:${libs.versions.hive.get()}") {

iceberg-catalog-migrator/docs/object-store-access-configuration.md

Lines changed: 46 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,16 +21,59 @@
2121

2222
This document provides a guide on how to configure access to object stores for the Iceberg Catalog Migrator.
2323

24+
## Required Dependencies
25+
26+
The Iceberg Catalog Migrator CLI jar does not include cloud provider dependencies to keep the distribution size small.
27+
Users must supplement the appropriate Iceberg object store bundle jar based on the object store being used.
28+
29+
Download the required bundle jar from [Maven Central](https://repo1.maven.org/maven2/org/apache/iceberg/)
30+
2431
## AWS S3
25-
For AWS, you can use the following environment variables:
32+
33+
### Required Dependencies
34+
Users must include the Iceberg AWS bundle jar (can be downloaded from [here](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws-bundle)) in the classpath:
35+
```shell
36+
java -cp iceberg-catalog-migrator-cli-0.1.0-SNAPSHOT.jar:iceberg-aws-bundle-x.x.x.jar \
37+
org.apache.polaris.iceberg.catalog.migrator.cli.CatalogMigrationCLI register \
38+
[your-options]
39+
```
40+
41+
For more information on AWS integration, refer to the [Iceberg AWS documentation](https://iceberg.apache.org/docs/nightly/aws/#enabling-aws-integration).
42+
43+
### Environment Variables
44+
For AWS, use the following environment variables:
2645
```shell
2746
export AWS_ACCESS_KEY_ID=xxxxxxx
2847
export AWS_SECRET_ACCESS_KEY=xxxxxxx
2948
export AWS_S3_ENDPOINT=xxxxxxx
3049
```
3150

32-
## ADLS
33-
For ADLS, you can use the following environment variables:
51+
## Azure Data Lake Storage (ADLS)
52+
53+
### Required Dependencies
54+
Users must include the Iceberg Azure bundle jar (can be downloaded from [here](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-azure-bundle)) in the classpath:
55+
```shell
56+
java -cp iceberg-catalog-migrator-cli-0.1.0.jar:iceberg-azure-bundle-x.x.x.jar \
57+
org.apache.polaris.iceberg.catalog.migrator.cli.CatalogMigrationCLI register \
58+
[your-options]
59+
```
60+
61+
### Environment Variables
62+
For ADLS, use the following environment variables:
3463
```shell
3564
export AZURE_SAS_TOKEN=xxxxxxx
3665
```
66+
67+
## Google Cloud Storage (GCS)
68+
69+
### Required Dependencies
70+
Users must include the Iceberg GCP bundle jar (can be downloaded from [here](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-gcp-bundle)) in the classpath:
71+
```shell
72+
java -cp iceberg-catalog-migrator-cli-0.1.0.jar:iceberg-gcp-bundle-x.x.x.jar \
73+
org.apache.polaris.iceberg.catalog.migrator.cli.CatalogMigrationCLI register \
74+
[your-options]
75+
```
76+
77+
## Notes
78+
- Replace `x.x.x` with the Iceberg version matching the release version of the migrator tool.
79+
- Multiple bundle jars can be included if users need to access multiple cloud providers.

iceberg-catalog-migrator/gradle/libs.versions.toml

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@
1919

2020
[versions]
2121
assertj = "3.27.3"
22-
aws = "2.33.0" # this is in mapping with iceberg repo.
2322
checkstyle = "10.21.3"
2423
errorprone = "2.36.0"
2524
errorproneSlf4j = "0.1.28"
@@ -43,15 +42,6 @@ testcontainers = "1.21.3"
4342

4443
[libraries]
4544
assertj = { module = "org.assertj:assertj-core", version.ref = "assertj" }
46-
aws-sdk-apache-client = { module = "software.amazon.awssdk:apache-client", version.ref = "aws" }
47-
aws-sdk-auth = { module = "software.amazon.awssdk:auth", version.ref = "aws" }
48-
aws-sdk-dynamo = { module = "software.amazon.awssdk:dynamodb", version.ref = "aws" }
49-
aws-sdk-glue = { module = "software.amazon.awssdk:glue", version.ref = "aws" }
50-
aws-sdk-kms = { module = "software.amazon.awssdk:kms", version.ref = "aws" }
51-
aws-sdk-lakeformation = { module = "software.amazon.awssdk:lakeformation", version.ref = "aws" }
52-
aws-sdk-sts = { module = "software.amazon.awssdk:sts", version.ref = "aws" }
53-
aws-sdk-s3 = { module = "software.amazon.awssdk:s3", version.ref = "aws" }
54-
aws-sdk-url-connection-client = { module = "software.amazon.awssdk:url-connection-client", version.ref = "aws" }
5545
checkstyle = { module = "com.puppycrawl.tools:checkstyle", version.ref = "checkstyle" }
5646
errorprone-annotations = { module = "com.google.errorprone:error_prone_annotations", version.ref = "errorprone" }
5747
errorprone-core = { module = "com.google.errorprone:error_prone_core", version.ref = "errorprone" }

0 commit comments

Comments
 (0)