Skip to content

Large deletions with bulk API stuck running #8062

@ageorget

Description

@ageorget

Hi,

With the LSST experiment, we are testing directories deletion using the bulk API (dCache 10.2.14).

A simple test is working fine with few directories :

{"activity":"DELETE",
 "expandDirectories":"ALL",
 "target":["/pnfs/in2p3.fr/lsst/users/ageorget/test/"
 ]}

> request ls
ID           | ARRIVED             |            MODIFIED |        OWNER |     STATUS | UID
43           | 2026/04/03-10:08:52 | 2026/04/03-10:08:57 |   44098:1021 |  COMPLETED | e854ff58-5e02-432c-af39-db875b038a7e


> request info e854ff58-5e02-432c-af39-db875b038a7e
e854ff58-5e02-432c-af39-db875b038a7e:
status:           COMPLETED
arrived at:       2026-04-03 10:08:52.026
started at:       2026-04-03 10:08:52.044
last modified at: 2026-04-03 10:08:57.018
target prefix:    /
targets:
CREATED                   |                   STARTED |                 COMPLETED |        STATE | TARGET
2026-04-03 10:08:56.988   |   2026-04-03 10:08:56.988 |    2026-04-03 10:08:57.01 |    COMPLETED | /pnfs/in2p3.fr/lsst/users/ageorget/test
2026-04-03 10:08:52.073   |   2026-04-03 10:08:52.073 |   2026-04-03 10:08:52.087 |    COMPLETED | /pnfs/in2p3.fr/lsst/users/ageorget/test/webdav-ccdcacli492Domain.access.tgz
2026-04-03 10:08:52.079   |   2026-04-03 10:08:52.079 |   2026-04-03 10:08:52.108 |    COMPLETED | /pnfs/in2p3.fr/lsst/users/ageorget/test/test1/webdav-ccdcacli492Domain.access.tgz
2026-04-03 10:08:52.085   |   2026-04-03 10:08:52.085 |   2026-04-03 10:08:52.112 |    COMPLETED | /pnfs/in2p3.fr/lsst/users/ageorget/test/test3/webdav-ccdcacli492Domain.access.tgz
2026-04-03 10:08:52.114   |   2026-04-03 10:08:52.114 |   2026-04-03 10:08:52.127 |    COMPLETED | /pnfs/in2p3.fr/lsst/users/ageorget/test/test2/test22
2026-04-03 10:08:52.13    |    2026-04-03 10:08:52.13 |   2026-04-03 10:08:52.141 |    COMPLETED | /pnfs/in2p3.fr/lsst/users/ageorget/test/test1
2026-04-03 10:08:52.144   |   2026-04-03 10:08:52.144 |    2026-04-03 10:08:56.34 |    COMPLETED | /pnfs/in2p3.fr/lsst/users/ageorget/test/test3
2026-04-03 10:08:56.344   |   2026-04-03 10:08:56.344 |   2026-04-03 10:08:56.984 |    COMPLETED | /pnfs/in2p3.fr/lsst/users/ageorget/test/test2

But with larger directories with 5k files and 6k directories, the bulk service starts few deletions and then stays stuck in RUNNING state until I cancel the request :

> ll /pnfs/in2p3.fr/lsst/butler/drp_prep/LSSTCam/runs/DRP/20250527-20250921/w_2025_41/DM-53071/20251103T090534Z/
drwxrwx---  2 lsstgrid lsst 512 Apr  3 10:14 consolidateSingleVisitStar_config
drwxrwx--- 24 lsstgrid lsst 512 Nov  3 10:46 consolidateSingleVisitStar_log
drwxrwx--- 24 lsstgrid lsst 512 Nov  3 10:46 consolidateSingleVisitStar_metadata
drwxrwx---  2 lsstgrid lsst 512 Apr  3 10:14 consolidateVisitSummary_config
drwxrwx--- 24 lsstgrid lsst 512 Nov  3 10:51 consolidateVisitSummary_log
drwxrwx--- 24 lsstgrid lsst 512 Nov  3 10:51 consolidateVisitSummary_metadata
drwxrwx---  2 lsstgrid lsst 512 Apr  3 10:14 packages
drwxrwx--- 24 lsstgrid lsst 512 Nov  3 10:51 preliminary_visit_summary
drwxrwx---  2 lsstgrid lsst 512 Apr  3 10:14 preliminary_visit_summary_schema
drwxrwx--- 24 lsstgrid lsst 512 Nov  3 10:46 single_visit_star

{"activity":"DELETE",
 "expandDirectories":"ALL",
 "target":["/pnfs/in2p3.fr/lsst/butler/drp_prep/LSSTCam/runs/DRP/20250527-20250921/w_2025_41/DM-53071/20251103T090534Z/"
 ]}


> request ls
ID           | ARRIVED             |            MODIFIED |        OWNER |     STATUS | UID
44           | 2026/04/03-10:14:30 | 2026/04/03-10:14:30 |   44098:1021 |    STARTED | 0d7c3806-0eb0-49da-a832-f2f2bd1fb40a

> request info 0d7c3806-0eb0-49da-a832-f2f2bd1fb40a
0d7c3806-0eb0-49da-a832-f2f2bd1fb40a:
status:           STARTED
arrived at:       2026-04-03 10:14:30.889
started at:       2026-04-03 10:14:30.907
last modified at: 2026-04-03 10:14:30.907
target prefix:    /
targets:
CREATED                   |                   STARTED |                 COMPLETED |        STATE | TARGET
2026-04-03 10:14:30.894   |   2026-04-03 10:14:30.894 |                         ? |      RUNNING | /pnfs/in2p3.fr/lsst/butler/drp_prep/LSSTCam/runs/DRP/20250527-20250921/w_2025_41/DM-53071/20251103T090534Z
2026-04-03 10:14:30.937   |   2026-04-03 10:14:30.937 |   2026-04-03 10:14:30.953 |    COMPLETED | /pnfs/in2p3.fr/lsst/butler/drp_prep/LSSTCam/runs/DRP/20250527-20250921/w_2025_41/DM-53071/20251103T090534Z/consolidateVisitSummary_config/consolidateVisitSummary_config_LSSTCam_runs_DRP_20250527-20250921_w_2025_41_DM-53071_20251103T090534Z.py
2026-04-03 10:14:30.944   |   2026-04-03 10:14:30.944 |   2026-04-03 10:14:30.964 |    COMPLETED | /pnfs/in2p3.fr/lsst/butler/drp_prep/LSSTCam/runs/DRP/20250527-20250921/w_2025_41/DM-53071/20251103T090534Z/packages/packages_LSSTCam_runs_DRP_20250527-20250921_w_2025_41_DM-53071_20251103T090534Z.yaml
2026-04-03 10:14:30.975   |   2026-04-03 10:14:30.975 |   2026-04-03 10:14:31.032 |    COMPLETED | /pnfs/in2p3.fr/lsst/butler/drp_prep/LSSTCam/runs/DRP/20250527-20250921/w_2025_41/DM-53071/20251103T090534Z/consolidateSingleVisitStar_config/consolidateSingleVisitStar_config_LSSTCam_runs_DRP_20250527-20250921_w_2025_41_DM-53071_20251103T090534Z.py
2026-04-03 10:14:31.005   |   2026-04-03 10:14:31.005 |   2026-04-03 10:14:31.034 |    COMPLETED | /pnfs/in2p3.fr/lsst/butler/drp_prep/LSSTCam/runs/DRP/20250527-20250921/w_2025_41/DM-53071/20251103T090534Z/preliminary_visit_summary_schema/preliminary_visit_summary_schema_LSSTCam_runs_DRP_20250527-20250921_w_2025_41_DM-53071_20251103T090534Z.fits

I did another test with a directory with only 277 subdir and no files and same symptoms :

> request info b4e50a62-1251-4fc5-957c-7eca0aa18e44
b4e50a62-1251-4fc5-957c-7eca0aa18e44:
status:           STARTED
arrived at:       2026-04-03 10:30:56.278
started at:       2026-04-03 10:30:56.3
last modified at: 2026-04-03 10:30:56.3
target prefix:    /
targets:
CREATED                   |                   STARTED |                 COMPLETED |        STATE | TARGET
2026-04-03 10:30:56.284   |   2026-04-03 10:30:56.284 |                         ? |      RUNNING | /pnfs/in2p3.fr/lsst/butler/drp_prep/LSSTCam/runs/DRP/20250527-20250921/w_2025_41/DM-53071/20251120T142129Z

I tried to increase the bulk limits but nothing changed :

 request policy
Maximum concurrent (active) requests     :       1000
Maximum requests per user                :      80000
Maximum expansion depth                  :        ALL
Maximum flat targets                     :       100000
Maximum shallow targets                  :         100
Maximum recursive targets                :        100 (also tried with 50k)

We also set bulk.allowed-directory-expansion=ALL

Is there any limitations to use the bulk API for deletions? Or did I miss something in the configuration?
Cheers

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions