Skip to content

CellProfiling/pyshowwwcase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyShowwwcase

PyShowwwcase is a reusable Django front-end for batch processing pipelines. It gives any "pure compute" command-line pipeline (originally bio-imaging tools) a multi-user web interface to:

  • upload input files and organize them into per-user projects,
  • submit a run of the backend pipeline against a project (a queued job),
  • show the results once the pipeline finishes.

The web layer knows nothing about what the pipeline does, and the pipeline knows nothing about the web layer. They are deliberately decoupled through a database job queue and a couple of cron loops, so the same front-end can be reskinned for many different pipelines with only a small, well-marked block of custom code per pipeline.

Audience / trust model. PyShowwwcase is an internal / lab tool. Guests are allowed by default and the run form lets users craft the pipeline's arguments, so anyone who can reach it can run the backend pipeline with inputs of their choosing. It is not hardened for hostile public exposure — keep it behind trusted networking. See Security model.


What you get

  • A file/project manager: login (or guest/demo access), create projects, upload files (with optional zip extraction), browse, rename, download, delete.
  • A run page: a pipeline-specific form that builds the backend's arguments and enqueues a job.
  • A queue page: see queued / running jobs and cancel your own queued ones.
  • A show page: a pipeline-specific results view (inline images, an embedded viewer, or a redirect to an external viewer such as EllViewer).
  • A template you copy to create a new pipeline integration, plus three reference implementations.

Architecture

PyShowwwcase has three moving parts: the web app, a MariaDB job queue, and a launcher that runs the real pipeline as its own Docker image.

            ┌──────────────────────────── host ────────────────────────────┐
            │                                                               │
 browser ──►│  nginx ──► gunicorn (pyshowwwcase_<impl> container)           │
            │                │   ▲                                          │
            │      enqueue   │   │ read results                             │
            │      job       ▼   │                                          │
            │            ┌─────────────┐                                    │
            │            │  MariaDB    │   `queue` table (one row per run)   │
            │            │  (maria_db) │                                     │
            │            └─────────────┘                                    │
            │              ▲        │ promote queued→started                │
            │   cron(1/min)│        ▼  drop args file + run_id              │
            │   docker exec … run_queue.sh → manage.py queue (state machine)│
            │                          │ writes ARGS_FILE, RUN_ID_FILE       │
            │                          ▼ to <FILES_ROOT>/pyshowwwcase/       │
            │   cron(1/min)   pyshowwwcase.sh (host, flock-guarded)          │
            │                          │ moves args into the pipeline work   │
            │                          ▼ dir, then:                          │
            │                 docker compose run -d <pipeline> ──► pipeline  │
            │                          │      writes results into the        │
            │                          ▼      project dir + LAST_RUN_ID       │
            │   (next queue tick sees LAST_RUN_ID == run_id → job done)      │
            └───────────────────────────────────────────────────────────────┘

The run lifecycle

  1. Enqueue. The user submits the run form. start_run validates it, builds the pipeline's argument payload (run_code), and inserts a row into the queue table with status queued. The web request returns immediately.
  2. Promote. manage.py queue (the state machine in management/commands/queue.py) runs once a minute inside the web container via run_queue.sh. It processes one job at a time:
    • if a job is already started, it checks the LAST_RUN_ID file the pipeline writes on completion; if it matches, the job is removed from the queue;
    • otherwise it takes the oldest queued job, writes its run_code to the args file (ARGS_FILE) and the run id to RUN_ID_FILE under <FILES_ROOT>/pyshowwwcase/, clears any stale results, and marks it started.
  3. Launch. pyshowwwcase.sh runs once a minute on the host (cron), guarded by flock so invocations can't overlap. When the pipeline isn't already running and an args file is waiting, it moves the args + run id into the pipeline's work directory and launches the real pipeline with docker compose run -d <pipeline>.
  4. Execute. The pipeline container reads the args, processes the project's inputs, writes its outputs back into the project directory (a shared volume), and writes LAST_RUN_ID when it finishes.
  5. Complete. The next queue tick sees LAST_RUN_ID == run_id and removes the job. The user opens show, and show_project reads the pipeline's outputs and renders them.

Because only one job is started at a time and the launcher refuses to start a second pipeline while one is running, runs execute serially.

Why decoupled this way

The pipeline runs as its own Docker image with its own dependencies and resource limits, completely isolated from the Django container. The web app never imports or calls the pipeline directly — it only leaves a job in a table and a file on disk. This is what makes the front-end reusable: swapping pipelines means changing the small custom block and the launcher wiring, not the web app.


Repository layout

pyshowwwcase/
├── README.md                ← this file
├── implementations.md       ← the 3 reference implementations & their decisions
├── howto.md                 ← step-by-step: template → new implementation
├── template/                ← the generic skeleton you copy
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── run_queue.sh         ← in-container: `python manage.py queue`
│   ├── pyshowwwcasesite/    ← the Django project
│   │   ├── manage.py
│   │   └── pyshowwwcase/    ← the Django app
│   │       ├── views.py          ← base views + a `# CUSTOM CODE` section
│   │       ├── database.py       ← SQLAlchemy engine + `users`/`queue` tables
│   │       ├── security.py       ← auth backend (mirrors an external users table)
│   │       ├── forms.py, models.py, urls.py
│   │       ├── management/commands/queue.py   ← the queue state machine
│   │       ├── templates/pyshowwwcase/*.html  ← UI (run.html / show.html are custom)
│   │       └── static/
│   └── setup/               ← deployment artifacts (per-host)
│       ├── docker-compose.yml    ← maria_db + the web service
│       ├── .env.example          ← template for secrets & config (copy to .env; DB, ports, SECRET_KEY, hosts)
│       ├── init.sql              ← DB schema (users, queue)
│       ├── pyshowwwcase_nginx    ← reverse-proxy config
│       ├── pyshowwwcase.sh       ← host launcher (cron)
│       └── crontab               ← the two cron entries
└── implementations/
    ├── pipex/               ← PIPEX pipeline (rich, self-hosted results)
    ├── proticelli/              ← protein viewer (redirects to external EllViewer)
    └── hpaseg/              ← HPAseg segmentation (generates an EllViewer CSV)

The template/ vs implementations/ split

Everything generic (auth, the file manager, the queue, the base views, the shared HTML) lives identically in template/ and in every implementation. Each implementation specializes only:

  • the # FROM HERE GOES THE CUSTOM CODE block in views.py (start_run, show_project, and any pipeline-specific helpers / serve_image / viewer),
  • run.html (the run form) and show.html (the results page, sometimes bypassed by a redirect to a viewer),
  • the custom settings (settings.py) and the setup/ wiring for that pipeline.

template/ is the canonical base — fixes to shared code are made there and kept byte-identical across the implementations.


Key concepts

  • Project — a per-user directory holding a run's inputs and (after a run) its outputs. A user's projects live under BASE_DIRECTORY/<user-or-guest-id>/<project>/.
  • Users, guests, demo — authenticated users are mirrored from an external users table into Django on login (security.py). When ALLOW_GUESTS is on, anonymous visitors get a project space keyed by client IP. Names in DEMO_USERLIST get read-only/demo access.
  • The args file & run id — the pipeline's inputs are handed over as a plain file (ARGS_FILE, e.g. a batch CSV or a command line) plus a RUN_ID_FILE; completion is signalled back via LAST_RUN_ID. This file handshake is the only contract between PyShowwwcase and the pipeline.
  • Showshow_project reads the pipeline's output files and presents them. Implementations do this differently (inline canvas, embedded TissUUmaps, or a redirect to EllViewer); see implementations.md.

Configuration surface

Two files hold everything you tune per deployment:

  • setup/.env.example (copy to .env at deploy) — secrets and host-specific values: MariaDB credentials, ports, the Django SECRET_KEY, DEBUG, the data root (*_FILES_ROOT), and *_SERVER_NAME (a space-separated host list from which CSRF_TRUSTED_ORIGINS and ALLOWED_HOSTS are derived). Loaded into the web container via env_file.
  • settings.py (custom section) — the integration knobs: REDIRECT_BASE_URL, STATIC_URL, BASE_DIRECTORY, FILES_ROOT, RUN_DIRECTORY, ARGS_FILE, RUN_ID_FILE, OUTPUT_FILE, SHOW_DIRECTORY, SHOW_IMAGE_SIZE, ALLOW_GUESTS, DEMO_USERLIST, and the per-guest/per-user limits (GUEST_* / USER_*).

All four projects keep the same set of settings entries (only the values differ), so an integration is mostly a matter of filling these in.


Security model

PyShowwwcase is built for a trusted, internal audience:

  • Guests are enabled by default and keyed by client IP (so users behind the same NAT share a space).
  • The run form lets users supply the pipeline's arguments — this is by design (it's the whole point), but it means a user can run the backend pipeline with arbitrary inputs. Never expose an instance to untrusted users.
  • DEBUG defaults to on.
  • Passwords are mirrored from the external users table using its existing (unsalted SHA-256) hashing — PyShowwwcase doesn't own that scheme.

Hardening that is in place (shared base): path-traversal containment (safe_path), session guards on all page views (require_session), bounded / staged zip extraction, and unique-temp-dir downloads.


Requirements & versions

  • Python 3.12 (Docker base python:3.12.13-slim).
  • Django 4.2 LTS, SQLAlchemy 2, gunicorn, mysql-connector-python.
  • MariaDB 10.3 (via Docker).
  • A host with Docker + docker compose, cron, and nginx.

Getting started

  • To understand the three existing integrations and the reasoning behind each, read implementations.md.
  • To build your own integration from the template, follow howto.md.

About

PyShowwwcase is a reusable Django front-end for batch processing pipelines

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors