You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _episodes/02-submit-jobs-w-justin.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,27 +1,27 @@
1
1
---
2
-
title: Submit grid jobs with JustIn
2
+
title: New justIN Job Submission System
3
3
teaching: 20
4
4
exercises: 0
5
5
questions:
6
-
- How to submit realistic grid jobs with JustIn
6
+
- How to submit realistic grid jobs with justIN
7
7
objectives:
8
-
- Demonstrate use of [justIn](https://dunejustin.fnal.gov) for job submission with more complicated setups.
8
+
- Demonstrate use of [justIN](https://dunejustin.fnal.gov) for job submission with more complicated setups.
9
9
keypoints:
10
10
- Always, always, always prestage input datasets. No exceptions.
11
11
---
12
12
13
-
# PLEASE USE THE NEW [justIn](https://dunejustin.fnal.gov) SYSTEM INSTEAD OF POMS
13
+
# PLEASE USE THE NEW [justIN](https://dunejustin.fnal.gov) SYSTEM INSTEAD OF POMS
14
14
15
-
__A simple [justIn](https://dunejustin.fnal.gov) Tutorial is currently in docdb at: [JustIn Tutorial](https://docs.dunescience.org/cgi-bin/sso/RetrieveFile?docid=30145)__
15
+
__A simple [justIN](https://dunejustin.fnal.gov) Tutorial is currently in docdb at: [justIN Tutorial](https://docs.dunescience.org/cgi-bin/sso/RetrieveFile?docid=30145)__
16
16
17
17
A more detailed tutorial is available at:
18
-
[JustIn Docs](https://dunejustin.fnal.gov/docs/)
18
+
[justIN Docs](https://dunejustin.fnal.gov/docs/)
19
19
20
-
The [justIn](https://dunejustin.fnal.gov) system is described in detail at:
20
+
The [justIN](https://dunejustin.fnal.gov) system is described in detail at:
Copy file name to clipboardExpand all lines: _episodes/07-grid-job-submission.md
+21-25Lines changed: 21 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: Jobsub Grid Job Submission and Common Errors - still 2024 version
2
+
title: Jobsub Grid Job Submission and Common Errors (SPECIAL PURPOSE)
3
3
teaching: 65
4
4
exercises: 0
5
5
questions:
@@ -68,8 +68,8 @@ The past few months have seen significant changes in how DUNE (as well as other
68
68
First, log in to a `dunegpvm` machine . Then you will need to set up the job submission tools (`jobsub`). If you set up `dunesw` it will be included, but if not, you need to do
69
69
70
70
~~~
71
-
mkdir -p /pnfs/dune/scratch/users/${USER}/DUNE_tutorial_sep2025 # if you have not done this before
As you can see, we have switched from the hard-coded directories to directories defined by environment variables; the `INPUT_TAR_DIR_LOCAL` variable will be set for us (see below).
239
-
Now, let's actually create our tar file. Again assuming you are in `/exp/dune/app/users/kherner/sep2025tutorial/`:
239
+
Now, let's actually create our tar file. Again assuming you are in `/exp/dune/app/users/kherner/jan2026tutorial/`:
240
240
```bash
241
-
tar --exclude '.git' -czf sep2025tutorial.tar.gz sep2025tutorial/localProducts_larsoft_v09_72_01_e20_prof sep2025tutorial/work setupsep2025tutorial-grid.sh
241
+
tar --exclude '.git' -czf jan2026tutorial.tar.gz jan2026tutorial/localProducts_larsoft_${DUNESW_VERSION}_${DUNESW_QUALIFIER} jan2026tutorial/work setupjan2026tutorial-grid.sh
242
242
```
243
243
Note how we have excluded the contents of ".git" directories in the various packages, since we don't need any of that in our jobs. It turns out that the .git directory can sometimes account for a substantial fraction of a package's size on disk!
244
244
245
245
Then submit another job (in the following we keep the same submit file as above):
You'll see this is very similar to the previous case, but there are some new options:
252
250
253
-
*`--tar_file_name=dropbox://` automatically **copies and untars** the given tarball into a directory on the worker node, accessed via the INPUT_TAR_DIR_LOCAL environment variable in the job. The value of INPUT_TAR_DIR_LOCAL is by default $CONDOR_DIR_INPUT/name_of_tar_file_without_extension, so if you have a tar file named e.g. sep2025tutorial.tar.gz, it would be $CONDOR_DIR_INPUT/sep2025tutorial.
251
+
*`--tar_file_name=dropbox://` automatically **copies and untars** the given tarball into a directory on the worker node, accessed via the INPUT_TAR_DIR_LOCAL environment variable in the job. The value of INPUT_TAR_DIR_LOCAL is by default $CONDOR_DIR_INPUT/name_of_tar_file_without_extension, so if you have a tar file named e.g. jan2026tutorial.tar.gz, it would be $CONDOR_DIR_INPUT/jan2026tutorial.
254
252
* Notice that the `--append_condor_requirements` line is longer now, because we also check for the fifeuser[1-4]. opensciencegrid.org CVMFS repositories.
255
253
256
254
The submission output will look something like this:
@@ -265,7 +263,7 @@ Could not locate uploaded file on RCDS. Will retry in 30 seconds.
265
263
Could not locate uploaded file on RCDS. Will retry in 30 seconds.
266
264
Found uploaded file on RCDS.
267
265
Transferring files to web sandbox...
268
-
Copying file:///nashome/k/kherner/.cache/jobsub_lite/js_2023_05_24_224713_9669e535-daf9-496f-8332-c6ec8a4238d9/run_sep2025tutorial.sh [DONE] after 0s
266
+
Copying file:///nashome/k/kherner/.cache/jobsub_lite/js_2023_05_24_224713_9669e535-daf9-496f-8332-c6ec8a4238d9/run_jan2026tutorial.sh [DONE] after 0s
269
267
Copying file:///nashome/k/kherner/.cache/jobsub_lite/js_2023_05_24_224713_9669e535-daf9-496f-8332-c6ec8a4238d9/simple.cmd [DONE] after 0s
270
268
Copying file:///nashome/k/kherner/.cache/jobsub_lite/js_2023_05_24_224713_9669e535-daf9-496f-8332-c6ec8a4238d9/simple.sh [DONE] after 0s
271
269
Submitting job(s).
@@ -566,8 +564,6 @@ Some more background material on these topics (including some examples of why ce
566
564
567
565
[Wiki page listing differences between jobsub_lite and legacy jobsub](https://fifewiki.fnal.gov/wiki/Differences_between_jobsub_lite_and_legacy_jobsub_client/server)
568
566
569
-
[DUNE Computing Tutorial:Advanced topics and best practices](DUNE_computing_tutorial_advanced_topics_20210129)
- Submit a basic batchjob and understand what's happening behind the scenes
9
+
- Monitor the job and look at its outputs
10
+
- Review best practices for submitting jobs (including what NOT to do)
11
+
keypoints:
12
+
- When in doubt, ask! Understand that policies and procedures that seem annoying, overly complicated, or unnecessary (especially when compared to running an interactive test) are there to ensure efficient operation and scalability. They are also often the result of someone breaking something in the past, or of simpler approaches not scaling well.
13
+
- Send test jobs after creating new workflows or making changes to existing ones. If things don't work, don't blindly resubmit and expect things to magically work the next time.
14
+
- Only copy what you need in input tar files. In particular, avoid copying log files, .git directories, temporary files, etc. from interactive areas.
15
+
- Take care to follow best practices when setting up input and output file locations.
16
+
- Always, always, always prestage input datasets. No exceptions.
17
+
---
18
+
19
+
<!-- > ## Note:
20
+
> This section describes basic job submission. Large scale submission of jobs to read DUNE data files are described in the [next section]({{ site.baseurl }}/08-submit-jobs-w-justin/index.html). -->
21
+
<!--
22
+
#### Session Video
23
+
24
+
This session will be captured on video a placed here after the workshop for asynchronous study.
25
+
<!-- The session was video captured for your asynchronous review. -->
26
+
The video from the two day version of this training in May 2022 is provided [here](https://www.youtube.com/embed/QuDxkhq64Og) as a reference. -->
27
+
28
+
<!--
29
+
<center>
30
+
<iframe width="560" height="315" src="https://www.youtube.com/embed/QuDxkhq64Og" title="DUNE Computing Tutorial May 2022 Grid Job Submission and Common Errors" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
31
+
</center>
32
+
-->
33
+
34
+
35
+
36
+
37
+
38
+
Once you have practiced basic justIn commands, please look at the instructions for running your own code below:
39
+
40
+
41
+
42
+
## First learn the basics of Justin Submit a job
43
+
44
+
Go to [The justIN Tutorial](https://dunejustin.fnal.gov/docs/tutorials.dune.md)
45
+
46
+
and work up to ["run some hello world jobs"](https://dunejustin.fnal.gov/docs/tutorials.dune.md#run-some-hello-world-jobs)
47
+
48
+
> ## Quiz
49
+
>
50
+
> 1. What is your workflow ID?
51
+
>
52
+
{: .solution}
53
+
54
+
Then work through
55
+
56
+
-[View your workflow on the justIN web dashboard](https://dunejustin.fnal.gov/docs/tutorials.dune.md#view-your-workflow-on-the-justin-web-dashboard)
57
+
-[Jobs with inputs and outputs](https://dunejustin.fnal.gov/docs/tutorials.dune.md#jobs-with-inputs-and-outputs)
58
+
-[Fetching files from Rucio managed storage](https://dunejustin.fnal.gov/docs/tutorials.dune.md#fetching-files-from-rucio-managed-storage)
59
+
- (skip for now) Jobs using GPUs
60
+
-[Jobs writing to scratch](https://dunejustin.fnal.gov/docs/tutorials.dune.md#jobs-writing-to-scratch)
61
+
62
+
63
+
64
+
65
+
66
+
## Submit a job using the tarball containing custom code
67
+
68
+
69
+
70
+
First off, a very important point: for running analysis jobs, **you may not actually need to pass an input tarball**, especially if you are just using code from the base release and you don't actually modify any of it. In that case, it is much more efficient to use everything from the release and refrain from using a tarball.
71
+
All you need to do is set up any required software from CVMFS (e.g. dunetpc and/or protoduneana), and you are ready to go.
72
+
If you're just modifying a fcl file, for example, but no code, it's actually more efficient to copy just the fcl(s) you're changing to the scratch directory within the job, and edit them as part of your job script (copies of a fcl file in the current working directory have priority over others by default).
73
+
74
+
Sometimes, though, we need to run some custom code that isn't in a release.
75
+
We need a way to efficiently get code into jobs without overwhelming our data transfer systems.
76
+
We have to make a few minor changes to the scripts you made in the previous tutorial section, generate a tarball, and invoke the proper jobsub options to get that into your job.
77
+
There are many ways of doing this but by far the best is to use the Rapid Code Distribution Service (RCDS), as shown in our example.
78
+
79
+
80
+
### Temporary short version of an example for custom code.
81
+
82
+
We're working on a long version of this but please look at these [instructions for running a justIN workflow using your own code]({{ site.baseurl }}/short_submission) for now.
83
+
84
+
### Cool justIN feature
85
+
86
+
justIN has a very useful interactive test command.
87
+
88
+
Here is a test from the short submission example.
89
+
90
+
~~~
91
+
{% include test_workflow.sh %}
92
+
~~~
93
+
94
+
it reads in a tarball from an area `$DUNEDATA` and writes output to a tmp area on your interactive machine. It works very well at emulating a grid job.
95
+
96
+
## Did your job work?
97
+
98
+
If not please ask over at #computing-questions in Slack
0 commit comments