Skip to content

Commit 8c938df

Browse files
authored
Merge pull request #25 from embulk/article-installing-maven-style-embulk-plugins
New article: Installing Maven-style Embulk plugins
2 parents 910e264 + 3d7c935 commit 8c938df

1 file changed

Lines changed: 197 additions & 0 deletions

File tree

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
---
2+
layout: posts
3+
title: "Installing Maven-style Embulk plugins"
4+
date: 2024-06-13
5+
description: "We recently started to provide a couple of methods to install the Maven-style Embulk plugins more easily, which was not very easy in the beginning of Maven-style plugins, indeed. This article is a brief introduction of the methods to install the Maven-style Embulk plugins."
6+
author: "dmikurube"
7+
---
8+
9+
Since [Embulk v0.11.0 was released a year ago](https://github.com/embulk/embulk/releases/tag/v0.11.0), we have pushed the new Maven-style Embulk plugins rather than the legacy RubyGems-style plugins.
10+
11+
See also: [Embulk v0.11 is coming soon: JRuby](https://www.embulk.org/articles/2023/04/13/embulk-v0.11-is-coming-soon.html#jruby)
12+
13+
We recently started to provide a couple of methods to install the Maven-style Embulk plugins more easily, which was not very easy in the beginning of Maven-style plugins, indeed.
14+
15+
This article is a brief introduction of the methods to install the Maven-style Embulk plugins.
16+
17+
## Revisit: Embulk home
18+
19+
Embulk now has a concept of the "Embulk home" directory, which is a directory to contain `embulk.properties` and Embulk plugin installations. The Maven-style Embulk plugins will also be installed in the Embulk home directory.
20+
21+
See again: [Embulk v0.11 is coming soon: Embulk home](https://www.embulk.org/articles/2023/04/13/embulk-v0.11-is-coming-soon.html#embulk-home)
22+
23+
## #1: Embulk's built-in subcommand `install`
24+
25+
[Embulk v0.11.3](https://github.com/embulk/embulk/releases/tag/v0.11.3) introduced a new Embulk subcommand: `embulk install`, instead of `embulk gem install` for RubyGems-style plugins. This subcommand takes a Maven artifact notation as its argument. The example below installs [`org.embulk:embulk-input-s3:0.6.0` from Maven Central](https://central.sonatype.com/artifact/org.embulk/embulk-input-s3/0.6.0).
26+
27+
```
28+
$ java -jar embulk-0.11.3.jar install "org.embulk:embulk-input-s3:0.6.0"
29+
...
30+
...
31+
2024-06-13 15:46:11.537 +0900 [INFO] (main): The path "/home/user/.embulk/lib/m2/repository" (m2_repo) does not exist. Creating it as a directory.
32+
2024-06-13 15:46:11.619 +0900 [INFO] (main): No alternative remote Maven repositories are specified. Downloading artifacts from Maven Central.
33+
2024-06-13 15:46:11.633 +0900 [INFO] (main): Downloading org.embulk:embulk-input-s3:pom:0.6.0 from https://repo.maven.apache.org/maven2
34+
2024-06-13 15:46:12.725 +0900 [INFO] (main): Downloaded org.embulk:embulk-input-s3:pom:0.6.0 at /home/user/.embulk/lib/m2/repository/org/embulk/embulk-input-s3/0.6.0/embulk-input-s3-0.6.0.pom
35+
2024-06-13 15:46:12.776 +0900 [INFO] (main): Downloading com.amazonaws:aws-java-sdk-s3:pom:1.11.466 from https://repo.maven.apache.org/maven2
36+
2024-06-13 15:46:13.027 +0900 [INFO] (main): Downloaded com.amazonaws:aws-java-sdk-pom:pom:1.11.466 at /home/user/.embulk/lib/m2/repository/com/amazonaws/aws-java-sdk-pom/1.11.466/aws-java-sdk-pom-1.11.466.pom
37+
...
38+
...
39+
2024-06-13 15:46:14.857 +0900 [INFO] (main): Downloading org.embulk:embulk-input-s3:jar:0.6.0 from https://repo.maven.apache.org/maven2
40+
2024-06-13 15:46:14.857 +0900 [INFO] (main): Downloading com.amazonaws:aws-java-sdk-s3:jar:1.11.466 from https://repo.maven.apache.org/maven2
41+
...
42+
...
43+
2024-06-13 15:46:15.720 +0900 [INFO] (main): Downloaded org.embulk:embulk-input-s3:jar:0.6.0 at /home/user/.embulk/lib/m2/repository/org/embulk/embulk-input-s3/0.6.0/embulk-input-s3-0.6.0.jar
44+
2024-06-13 15:46:15.721 +0900 [INFO] (main): Downloaded com.amazonaws:aws-java-sdk-s3:jar:1.11.466 at /home/user/.embulk/lib/m2/repository/com/amazonaws/aws-java-sdk-s3/1.11.466/aws-java-sdk-s3-1.11.466.jar
45+
...
46+
...
47+
2024-06-13 15:46:15.730 +0900 [INFO] (main): Installed org.embulk:embulk-input-s3:jar:0.6.0 at /home/user/.embulk/lib/m2/repository/org/embulk/embulk-input-s3/0.6.0/embulk-input-s3-0.6.0.jar
48+
2024-06-13 15:46:15.730 +0900 [INFO] (main): Installed com.amazonaws:aws-java-sdk-s3:jar:1.11.466 at /home/user/.embulk/lib/m2/repository/com/amazonaws/aws-java-sdk-s3/1.11.466/aws-java-sdk-s3-1.11.466.jar
49+
...
50+
...
51+
```
52+
53+
This subcommand downloads also the dependencies of the specified Maven artifact transitively as you can see in the example above.
54+
55+
Note that you can change the destination Embulk home directory by Embulk's standard options. See the example below.
56+
57+
```
58+
$ java -jar embulk-0.11.3.jar -Xembulk_home=/tmp/foo install "org.embulk:embulk-input-s3:0.6.0"
59+
...
60+
61+
$ env EMBULK_HOME=/tmp/bar java -jar embulk-0.11.3.jar install "org.embulk:embulk-input-s3:0.6.0"
62+
...
63+
```
64+
65+
It now supports only [Maven Central](https://central.sonatype.com/) as the remote repository, unfortunately.
66+
67+
## #2: Out-of-Embulk Embulk plugin installer
68+
69+
Embulk has had the `mkbundle` subcommand and the `-b` option so that users can maintain plugin installations by `Gemfile`, but it works only for RubyGems-style plugins, of course.
70+
71+
[The Gradle `org.embulk.runset` plugin](https://github.com/embulk/gradle-embulk-runset) is an alternative for Maven-style Embulk plugin. It works out of the Embulk package at all.
72+
73+
To use this, set up an environment for [Gradle](https://gradle.org/install/) at first. [Gradle 8.7](https://docs.gradle.org/8.7/userguide/userguide.html) is at least required. You may want to choose [the Gradle wrapper](https://docs.gradle.org/8.7/userguide/userguide.html) in typical use-cases.
74+
75+
Next, write `build.gradle` to declare the Maven-based Embulk plugins you wanted to install.
76+
77+
```
78+
plugins {
79+
id "org.embulk.runset" version "0.2.0" // Just apply this Gradle plugin.
80+
}
81+
82+
repositories {
83+
mavenCentral()
84+
}
85+
86+
installEmbulkRunSet {
87+
// Set your Embulk home directory (absolute path) to install the Embulk plugins and "embulk.properties".
88+
embulkHome file("/home/user/my-embulk-home")
89+
90+
// Specify the Maven-style Embulk plugin by the "artifact" directive.
91+
artifact "org.embulk:embulk-input-s3:0.6.0"
92+
93+
// You can specify multiple versions of the same Embulk plugin so that you can choose the version at runtime.
94+
// You can also specify an artifact with the split-style notation.
95+
artifact group: "org.embulk", name: "embulk-input-s3", version: "0.5.3"
96+
97+
// Specify this if you need JRuby.
98+
// It downloads jruby-complete-9.1.15.0.jar, and set the "jruby" Embulk System Property in "embulk.properties".
99+
jruby "org.jruby:jruby-complete:9.1.15.0"
100+
101+
// Specify this if you need to set some Embulk System Properties manually.
102+
// It sets the "key" Embulk System Property to "value" in "embulk.properties".
103+
embulkSystemProperty "key", "value"
104+
}
105+
```
106+
107+
Then, run `gradle installEmbulkRunSet` (`./gradlew` when you use the Gradle wrapper) to set up.
108+
109+
```
110+
$ gradlew installEmbulkRunSet
111+
112+
> Configure project :
113+
Supplied embulkHome "/home/user/my-embulk-home" does not exist, then will be created.
114+
Setting to copy org.embulk:embulk-input-s3:0.6.0:jar into org/embulk/embulk-input-s3/0.6.0
115+
Setting to copy com.amazonaws:aws-java-sdk-s3:1.11.466:jar into com/amazonaws/aws-java-sdk-s3/1.11.466
116+
...
117+
...
118+
Setting to copy org.embulk:embulk-input-s3:0.5.3:jar into org/embulk/embulk-input-s3/0.5.3
119+
...
120+
...
121+
Setting to copy org.embulk:embulk-input-s3:0.5.3:pom into org/embulk/embulk-input-s3/0.5.3
122+
...
123+
...
124+
Setting to copy org.jruby:jruby-complete:9.1.15.0:jar into org/jruby/jruby-complete/9.1.15.0
125+
126+
BUILD SUCCESSFUL in 2s
127+
1 actionable task: 1 executed
128+
```
129+
130+
The Embulk System Properties file `embulk.properties` is automatically generated in the specified Embulk home, too.
131+
132+
```
133+
#Generated by the "org.embulk.embulk-runset" Gradle plugin.
134+
#Thu Jun 13 16:53:31 JST 2024
135+
key=value
136+
jruby=file\:///home/user/my-embulk-home/lib/m2/repository/org/jruby/jruby-complete/9.1.15.0/jruby-complete-9.1.15.0.jar
137+
```
138+
139+
## Run!
140+
141+
In either style of installation, you can run Embulk with the installed Maven-style Embulk plugins.
142+
143+
See the example `s3_with_maven.yaml` below.
144+
145+
```yaml
146+
in:
147+
# The full-style type notation for Maven-style Embulk plugins.
148+
type:
149+
source: maven
150+
group: org.embulk
151+
name: s3
152+
version: 0.6.0
153+
bucket: ...
154+
parser:
155+
type: csv
156+
...
157+
out:
158+
type: stdout
159+
```
160+
161+
Then, run Embulk!
162+
163+
```
164+
$ java -jar embulk-0.11.4.jar -Xembulk_home=/home/user/my-embulk-home run s3_with_maven.yml
165+
2024-06-13 17:01:55.373 +0900 [INFO] (main): embulk_home is set from command-line: /home/user/my-embulk-home
166+
2024-06-13 17:01:55.378 +0900 [INFO] (main): m2_repo is set as a sub directory of embulk_home: /home/user/my-embulk-home/lib/m2/repository
167+
2024-06-13 17:01:55.378 +0900 [INFO] (main): gem_home is set as a sub directory of embulk_home: /home/user/my-embulk-home/lib/gems
168+
2024-06-13 17:01:55.378 +0900 [INFO] (main): gem_path is set empty.
169+
2024-06-13 17:01:55.378 +0900 [DEBUG] (main): Embulk system property "default_guess_plugin" is set to: "gzip,bzip2,json,csv"
170+
2024-06-13 17:01:55.634 +0900 [INFO] (main): Started Embulk v0.11.4
171+
2024-06-13 17:01:55.811 +0900 [INFO] (0001:transaction): Loaded plugin embulk-input-s3 (maven:org.embulk:s3:0.6.0)
172+
...
173+
...
174+
2024-06-13 17:01:55.948 +0900 [INFO] (0001:transaction): Loaded plugin embulk-output-stdout
175+
...
176+
...
177+
2024-06-13 17:01:56.052 +0900 [INFO] (0001:transaction): Loaded plugin embulk-parser-csv
178+
...
179+
...
180+
2024-06-13 17:01:56.691 +0900 [INFO] (0001:transaction): Start listing file with prefix [******]
181+
2024-06-13 17:01:57.577 +0900 [INFO] (0001:transaction): Found total [1] files
182+
2024-06-13 17:01:57.721 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=16 / output tasks 8 = input tasks 1 * 8
183+
2024-06-13 17:01:57.759 +0900 [INFO] (0001:transaction): {done: 0 / 1, running: 0}
184+
...
185+
...
186+
1,foo
187+
2,bar
188+
3,baz
189+
2024-06-13 17:01:58.602 +0900 [INFO] (0001:transaction): {done: 1 / 1, running: 0}
190+
2024-06-13 17:01:58.603 +0900 [INFO] (0001:transaction): Incremental job, setting last_path to [******.csv]
191+
2024-06-13 17:01:58.618 +0900 [INFO] (0001:transaction): Embulk system property "plugins.output.stdout" is not set.
192+
2024-06-13 17:01:58.619 +0900 [INFO] (0001:transaction): Embulk system property "plugins.default.output.stdout" is not set.
193+
2024-06-13 17:01:58.621 +0900 [INFO] (main): Committed.
194+
2024-06-13 17:01:58.629 +0900 [INFO] (main): Next config diff: {"in":{"last_path":"******.csv"},"out":{}}
195+
```
196+
197+
We hope those installation methods will help you.

0 commit comments

Comments
 (0)