added finding missing section function

hschryver · hschryver · commit 83ff1aa5cfd9 · 2026-01-16T00:29:52.000Z
diff --git a/README.md b/README.md
@@ -1,46 +1,111 @@
-# xenium_analysis_tools
+```markdown
+# Xenium Analysis Tools
 
-[![License](https://img.shields.io/badge/license-MIT-brightgreen)](LICENSE)
-![Code Style](https://img.shields.io/badge/code%20style-black-black)
-[![semantic-release: angular](https://img.shields.io/badge/semantic--release-angular-e10079?logo=semantic-release)](https://github.com/semantic-release/semantic-release)
-![Interrogate](https://img.shields.io/badge/interrogate-100.0%25-brightgreen)
-![Coverage](https://img.shields.io/badge/coverage-100%25-brightgreen)
-![Python](https://img.shields.io/badge/python->=3.10-blue?logo=python)
-
-## Usage
- - To use this template, click the green `Use this template` button and `Create new repository`.
- - After github initially creates the new repository, please wait an extra minute for the initialization scripts to finish organizing the repo.
- - To enable the automatic semantic version increments: in the repository go to `Settings` and `Collaborators and teams`. Click the green `Add people` button. Add `svc-aindscicomp` as an admin. Modify the file in `.github/workflows/tag_and_publish.yml` and remove the if statement in line 65. The semantic version will now be incremented every time a code is committed into the main branch.
- - To publish to PyPI, enable semantic versioning and uncomment the publish block in `.github/workflows/tag_and_publish.yml`. The code will now be published to PyPI every time the code is committed into the main branch.
- - The `.github/workflows/test_and_lint.yml` file will run automated tests and style checks every time a Pull Request is opened. If the checks are undesired, the `test_and_lint.yml` can be deleted. The strictness of the code coverage level, etc., can be modified by altering the configurations in the `pyproject.toml` file and the `.flake8` file.
- - Please make any necessary updates to the README.md and CITATION.cff files
-
-## Level of Support
-Please indicate a level of support:
- - [ ] Supported: We are releasing this code to the public as a tool we expect others to use. Issues are welcomed, and we expect to address them promptly; pull requests will be vetted by our staff before inclusion.
- - [ ] Occasional updates: We are planning on occasional updating this tool with no fixed schedule. Community involvement is encouraged through both issues and pull requests.
- - [ ] Unsupported: We are not currently supporting this code, but simply releasing it to the community AS IS but are not able to provide any guarantees of support. The community is welcome to submit issues, but you should not expect an active response.
-
-## Release Status
-GitHub's tags and Release features can be used to indicate a Release status.
-
- - Stable: v1.0.0 and above. Ready for production.
- - Beta:  v0.x.x or indicated in the tag. Ready for beta testers and early adopters.
- - Alpha: v0.x.x or indicated in the tag. Still in early development.
+A Python library for processing and mapping Xenium spatial data, developed by the Allen Institute for Neural Dynamics.
 
 ## Installation
-To use the software, in the root directory, run
-```bash
-pip install -e .
+
+### Code Ocean Package Manager (Recommended)
+This library can be installed directly via the Code Ocean environment manager.
+
+1. Open your Capsule.
+2. Go to the **Environment** tab.
+3. In the **Pip** section, click **Add**.
+4. Paste the following link:
+   ```text
+   git+[https://github.com/AllenInstitute/xenium_analysis_tools#egg=xenium-analysis-tools](https://github.com/AllenInstitute/xenium_analysis_tools#egg=xenium-analysis-tools)
+
 ```
 
-To develop the code, run
+5. Click **Launch Cloud Workstation** to build.
+
+### Local Installation
+
+To install locally or in a standard terminal:
+
 ```bash
-pip install -e . --group dev
+pip install git+[https://github.com/AllenInstitute/xenium_analysis_tools.git](https://github.com/AllenInstitute/xenium_analysis_tools.git)
+
 ```
-Note: --group flag is available only in pip versions >=25.1
 
-Alternatively, if using `uv`, run
-```bash
-uv sync
+---
+
+## Modules
+
+The library is organized into three primary sub-packages designed to handle different stages of the Xenium analysis pipeline.
+
+### 1. `process_xenium`
+
+Tools for processing raw Xenium outputs, managing SpatialData objects, and preparing data for downstream analysis.
+
+* **`process_dataset_slides`**: Main workflow for processing slides across an entire dataset.
+* **`process_spatialdata`**: Core logic for manipulating and formatting Xenium `SpatialData` objects.
+* **`divide_sections`**: Utilities for handling section boundaries and splitting data.
+* **`validate_sections`**: Quality control checks to ensure section integrity before processing.
+* **`generate_dataset_slides`**: Helper functions for creating slide-level representations.
+
+### 2. `map_xenium`
+
+Functions for mapping cell types to Xenium data using reference taxonomies.
+
+* **`map_sections`**: Logic for mapping cell types on individual tissue sections.
+* **`map_dataset_sections`**: Batch processing tools to apply mapping across multiple sections in a dataset.
+
+### 3. `utils`
+
+Shared utility functions used across the library.
+
+* **`io_utils`**: Standardized functions for loading and saving Xenium data structures.
+
+---
+
+## Usage
+
+Import the specific modules you need for your analysis workflow.
+
+**Example: Processing a Dataset**
+
+```python
+from xenium_analysis_tools.process_xenium import process_dataset_slides
+from xenium_analysis_tools.utils import io_utils
+
+# Load your configuration or data path
+data_path = "/path/to/xenium/data"
+
+# Run the processing pipeline
+process_dataset_slides.run(data_path)
+
+```
+
+**Example: Mapping Sections**
+
+```python
+from xenium_analysis_tools.map_xenium import map_dataset_sections
+
+# Run cell type mapping on processed sections
+map_dataset_sections.run_mapping(
+    processed_data_path="/path/to/processed/data",
+    taxonomy_ref="/path/to/taxonomy"
+)
+
 ```
+
+---
+
+## Development
+
+### Updating the Package
+
+1. Make changes to the code in the `src/` directory.
+2. Bump the version in `src/xenium_analysis_tools/__init__.py`.
+3. Commit and push to GitHub.
+4. Create and push a new tag matching the version (e.g., `v0.1.1`).
+
+### Running Tests
+
+This project uses `pytest`. Run the following in the root directory:
+
+```bash
+pytest tests/
+
+```
diff --git a/src/xenium_analysis_tools/process_xenium/generate_dataset_slides.py b/src/xenium_analysis_tools/process_xenium/generate_dataset_slides.py
@@ -15,6 +15,28 @@
 )
 from xenium_analysis_tools.process_xenium.process_spatialdata import read_xenium_slide
 
+def find_xenium_bundle(bundle_name, data_folder='/root/capsule/data'):
+    data_folder = Path(data_folder)
+    search_paths = [
+        data_folder / 'xenium_data',
+        data_folder / 'Xenium_output_pilot'
+    ]
+    search_paths = [path for path in search_paths if path.exists()]
+    all_dirs = np.concatenate([list(folder.iterdir()) for folder in search_paths])
+    output_folders = np.concatenate([list(folder.glob('output-*')) for folder in search_paths])
+    subfolders = np.setdiff1d(all_dirs, output_folders)
+    path_to_bundle = None
+    found_dirs = [dir for dir in output_folders if dir.name == bundle_name]
+    if found_dirs:
+        path_to_bundle = found_dirs[0]
+    else:
+        for sub in subfolders:
+            found_dirs = [dir for dir in list(sub.iterdir()) if dir.name == bundle_name]
+            if found_dirs:
+                path_to_bundle = found_dirs[0]
+                break
+    return path_to_bundle
+    
 def generate_slides(dataset_name: str, config_path: str=None, select_sections: list[int]|None = None):
     """
     Generate slide-level SpatialData objects from raw Xenium data bundles.
@@ -55,6 +77,16 @@ def generate_slides(dataset_name: str, config_path: str=None, select_sections: l
                 logger.info(f"Slide {slide_id} already processed. Skipping.")
                 continue
             logger.info(f"Generating SpatialData object for slide {slide_id}...")
+            if not (raw_slide_path / 'experiment.xenium').exists():
+                logger.info(f"Experiment file not found for slide {slide_id} at {raw_slide_path / 'experiment.xenium'}")
+                logger.info(f"Looking for alternative experiment file...")
+                path_to_bundle = find_xenium_bundle(slide_row['dir'], data_folder=paths['data_root'])
+                if path_to_bundle is not None:
+                    logger.info(f"Found alternative experiment file in {path_to_bundle.parent}")
+                    raw_slide_path = path_to_bundle
+                else:
+                    logger.error(f"Could not find experiment file for slide {slide_id}. Skipping.")
+                    continue
             logger.info(f"Reading Xenium bundle: {raw_slide_path}")
             sdata_reader_params = config.get('sdata_reader_params', {})
             if sdata_reader_params.get('n_jobs') == "max": sdata_reader_params['n_jobs'] = os.cpu_count()