Adding support for .jpk-qi-data and .bin files#190
Conversation
…igger point or by contact point
… a function to general loader
…make the save an adjustable option
… once, then extracting separately
…files and removing the ability to run curve analysis directly in the reader
…ore robust coordinate access
…o eagerly load lazy loaded data
…me efficient method
…g to memory then saving in one go when duplicating data to h5
…sampled curves then are streamed directly into h5
…much more memory efficient
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #190 +/- ##
===========================================
- Coverage 79.84% 51.60% -28.25%
===========================================
Files 12 15 +3
Lines 928 1804 +876
===========================================
+ Hits 741 931 +190
- Misses 187 873 +686 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
SylviaWhittle
left a comment
There was a problem hiding this comment.
Small partial review.
I'd like to call with you to talk about the interesting lazy loading of data if you have time?
| return frames, pixel_to_nanometre_scaling_factor, header_dict | ||
|
|
||
|
|
||
| def get_asd_channels(file_path: Path): |
There was a problem hiding this comment.
I don't have time to check this manually - does it work? There isn't a test but frankly we don't have the dev time to move slowly. If you say it works, this is fine with me.
There was a problem hiding this comment.
The channel fetching seems to work though I'm happy to quickly make some tests for them as that should be pretty quick.
| elif len(h5_returned) == 4: | ||
| image, pixel_to_nanometre_scaling_factor, _, curve_data = h5_returned # type: ignore[misc] | ||
| self.loaded_curves = True | ||
| print( |
There was a problem hiding this comment.
| print( | |
| logger.info( |
or just remove it.
There was a problem hiding this comment.
Yes, I'll just remove them, they are some accidentally left in prints for debugging.
| f"Loaded image with shape {image.shape} and pixel to nanometre " | ||
| f"scaling factor {pixel_to_nanometre_scaling_factor}" | ||
| ) | ||
| print(f"Image has max value {image.max()} and min value {image.min()}") |
| image, pixel_to_nanometre_scaling_factor = spm.load_spm(self.filepath, self.channel) | ||
| elif self.suffix == ".h5-jpk": | ||
| image, pixel_to_nanometre_scaling_factor, _ = h5_jpk.load_h5jpk(self.filepath, self.channel) | ||
| h5_returned = h5_jpk.load_h5jpk(self.filepath, self.channel, load_curves=not self.loaded_curves) |
There was a problem hiding this comment.
Could this entire block be absorbed into the h5_jpk.load_h5jpk function call? It's a little messy here.
I'll have a go too.
There was a problem hiding this comment.
Yes, is definitely a bit messy. My thinking was preventing the need to make changes to topostats by not changing what each load function returns if it isn't reading the curve data I'm adding support for. Though I think h5jpk is not used in TopoStats anyway? And it might be necessary to make changes to topostats reading anyway due to the changes I have been working on for returning the z unit for the read files?
| raise e | ||
|
|
||
| # scope for a "check what channels are available" function similar to above. | ||
| def get_available_channels(self): # noqa: C901 |
There was a problem hiding this comment.
Guessing this is for the napari feature to list the channels?
Well done if this is all working, that's a lot of work.
| A proxy object for the specified row. | ||
| """ | ||
|
|
||
| class RowProxy: |
There was a problem hiding this comment.
Is the reason RowProxy is defined within the scope of LazyQiData encapsulation? It'll get redefined each time __getitem__ runs, though this is likely a negligible and unimportant cost.
| self.parent = parent | ||
| self.y = y | ||
|
|
||
| def __getitem__(self, x: int): |
There was a problem hiding this comment.
I'm going to need a bit of an explanation on how this is intended to work - could we set up a call to talk about it? There's a lot going on here. Plus, frankly I'm just curious.
| # Set the format to have blue time, green file, module, function and line, and white message | ||
| logger.add( | ||
| sys.stderr, | ||
| lambda msg: sys.stderr.write(msg), # pylint: disable=unnecessary-lambda |
There was a problem hiding this comment.
What necessitated this change?
There was a problem hiding this comment.
It seems like the general_loader and some of the spm tests which required were failing due to not being able to correctly reading the logged output (instead always reading an empty string), they seem to be failing when I run even the main branch locally for that reason. The lamda appears to be necessary due to a quirk of loguru where lamda makes sys.stderr be dynamically retrieved at the time of logging rather than once at import time.
| return final_multiplier, final_offset, unit | ||
|
|
||
|
|
||
| class jpk_qi_loader: |
There was a problem hiding this comment.
Capitalise this please :)
| # Open the ZIP archive once and keep it open for the duration of the loading process | ||
| self.qi_archive = zipfile.ZipFile(self.filepath, "r") # pylint: disable=consider-using-with | ||
| logger.info(f"Opened JPK QI archive at {self.filepath}") | ||
| self.namelist = self.qi_archive.namelist() |
There was a problem hiding this comment.
Maybe call this list_of_all_paths or something - since namelist could be anything? Just so future people (including ourselves) can tell at a glance what this is.
Maybe also add a comment above as to what it's used for? :)
| self.qi_archive = zipfile.ZipFile(self.filepath, "r") # pylint: disable=consider-using-with | ||
| logger.info(f"Opened JPK QI archive at {self.filepath}") | ||
| self.namelist = self.qi_archive.namelist() | ||
| # Set path to the .jpk-qi-image file within the archive for later use |
There was a problem hiding this comment.
| # Set path to the .jpk-qi-image file within the archive for later use | |
| # For holding the reference to where the actual .jqk-qi image is (not the metadata). |
| if self.save_as_h5: | ||
| self.save_lite_data() | ||
|
|
||
| return (self.image, self.px2nm, (self.curve_data, self.channels_units, self.full_metadata)) |
There was a problem hiding this comment.
Possibly bundle these objects into one metadata dataclass, JPKQiCurveData? Gentle suggestion.
Adds support for .jpk-qi-data files, allowing the loading of the image for the selected channel. It also allows for the raw force-distance curve data as well as metadata for each curve to be loaded efficiently, using a lazy data structure approach that loads each section of the dataset into memory only when it is required.
The updates also allow for the conversion of .jpk-qi-data files into a HDF5 file format (.h5-jpk) for much faster mass analysis. Hence, the h5-jpk loading code has also been updated to support the loading of these converted files.
The update also allows .bin with different binary formats to be loaded.
closes #174 closes #173