Skip to content

Commit 50cf2aa

Browse files
committed
Merge branch '182_precursor_request_patch' into 'master'
Restructure format for LC Metabref query and add bulk molecular formula searching Closes #182, #110, and #142 See merge request mass-spectrometry/corems!144
2 parents 6791b0c + 52d5cae commit 50cf2aa

29 files changed

Lines changed: 12970 additions & 12015 deletions

.bumpversion.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 3.1.0
2+
current_version = 3.2.0
33
commit = False
44
tag = False
55

CONTRIBUTING.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ Thank you for considering contributing to CoreMS! We appreciate your interest in
55
## Table of Contents
66

77
- [Getting Started](#getting-started)
8+
- [Versioning](#versioning)
89
- [Merge Request Checklist](#merge-request-checklist)
910
- [Code Style](#code-style)
1011
- [Issue Reporting](#issue-reporting)
@@ -19,9 +20,12 @@ To get started with contributing to CoreMS, please follow these steps:
1920
3. Install the necessary dependencies. Refer to the [README](./README.md) for detailed installation instructions.
2021
4. Make your changes or additions.
2122
5. Test your changes thoroughly.
22-
6. Re-render documenation using the following `pdoc --o docs --d numpy corems`. Note that pdoc versioning is part of the requirements-dev.txt.
23-
7. Commit your changes and push them to your forked repository. Reference your original issue in your commits (i.e. closes #23)
24-
8. Submit a merge request to the main CoreMS repository and select an appropriate reviewer for the changes. Note the merge request checklist below that will be checked before each merge into the master branch. See the merge request checkliist
23+
6. Commit your changes and push them to your forked repository. Reference your original issue in your commits (i.e. closes #23)
24+
7. Submit a merge request to the main CoreMS repository and select an appropriate reviewer for the changes. Note the merge request checklist below that will be checked before each merge into the master branch. See the merge request checklist
25+
26+
## Versioning
27+
28+
We strive to use semantic versioning. To bump a new version and regenerate documentation, use one of the following make commands (according to version number) `make major`, `make minor`, or `make patch`. This should accompany each PiPy release.
2529

2630
## Merge Request Checklist
2731

Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,17 @@ mem:
1414
major:
1515

1616
@bumpversion major --allow-dirty
17+
@$(MAKE) docu
1718

1819
minor:
1920

2021
@bumpversion minor --allow-dirty
22+
@$(MAKE) docu
2123

2224
patch:
2325

2426
@bumpversion patch --allow-dirty
27+
@$(MAKE) docu
2528

2629
pypi_test:
2730
@rm -rf build dist *.egg-info

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ CoreMS aims to provide
4949

5050
## Current Version
5151

52-
`3.1.0`
52+
`3.2.0`
5353

5454
***
5555

@@ -335,11 +335,11 @@ UML (unified modeling language) diagrams for Direct Infusion FT-MS and GC-MS cla
335335
336336
If you use CoreMS in your work, please use the following citation:
337337
338-
Version [3.1.0 Release on GitHub](https://github.com/EMSL-Computing/CoreMS/releases/tag/v3.1.0), archived on Zenodo:
338+
Version [3.2.0 Release on GitHub](https://github.com/EMSL-Computing/CoreMS/releases/tag/v3.2.0), archived on Zenodo:
339339
340340
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14009575.svg)](https://doi.org/10.5281/zenodo.14009575)
341341
342-
Yuri E. Corilo, William R. Kew, Lee Ann McCue, Katherine R . Heal, James C. Carr (2024, October 29). EMSL-Computing/CoreMS: CoreMS 3.1.0 (Version v3.1.0), as developed on Github. Zenodo. http://doi.org/10.5281/zenodo.14009575
342+
Yuri E. Corilo, William R. Kew, Lee Ann McCue, Katherine R . Heal, James C. Carr (2024, October 29). EMSL-Computing/CoreMS: CoreMS 3.2.0 (Version v3.2.0), as developed on Github. Zenodo. http://doi.org/10.5281/zenodo.14009575
343343
344344
```
345345

corems/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
__author__ = "Yuri E. Corilo"
2-
__version__ = "3.1.0"
2+
__version__ = "3.2.0"
33
import time
44
import os
55
import sys

corems/mass_spectrum/input/massList.py

Lines changed: 27 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ def add_molecular_formula(self, mass_spec_obj, dataframe):
106106
mass_spec_mz_exp_list = mass_spec_obj.mz_exp
107107

108108
for df_index, mz_exp in enumerate(mz_exp_df):
109+
bad_mf = False
109110
counts = 0
110111

111112
ms_peak_index = list(mass_spec_mz_exp_list).index(float(mz_exp))
@@ -200,28 +201,32 @@ def add_molecular_formula(self, mass_spec_obj, dataframe):
200201
matched_isos.append(iso)
201202

202203
if len(matched_isos) == 0:
203-
raise ValueError("No isotopologue matched the formula_dict")
204-
mfobj = matched_isos[0]
205-
206-
# Add the mono isotopic index, confidence score and isotopologue similarity
207-
mfobj.mspeak_index_mono_isotopic = int(
208-
dataframe.iloc[df_index]["Mono Isotopic Index"]
209-
)
210-
211-
# Add the confidence score and isotopologue similarity and average MZ error score
212-
if "m/z Error Score" in dataframe:
213-
mfobj._mass_error_average_score = float(
214-
dataframe.iloc[df_index]["m/z Error Score"]
215-
)
216-
if "Confidence Score" in dataframe:
217-
mfobj._confidence_score = float(
218-
dataframe.iloc[df_index]["Confidence Score"]
219-
)
220-
if "Isotopologue Similarity" in dataframe:
221-
mfobj._isotopologue_similarity = float(
222-
dataframe.iloc[df_index]["Isotopologue Similarity"]
223-
)
224-
mass_spec_obj[ms_peak_index].add_molecular_formula(mfobj)
204+
#FIXME: This should not occur see https://code.emsl.pnl.gov/mass-spectrometry/corems/-/issues/190
205+
warnings.warn(f"No isotopologue matched the formula_dict: {formula_dict}")
206+
bad_mf = True
207+
else:
208+
bad_mf = False
209+
mfobj = matched_isos[0]
210+
211+
# Add the mono isotopic index, confidence score and isotopologue similarity
212+
mfobj.mspeak_index_mono_isotopic = int(
213+
dataframe.iloc[df_index]["Mono Isotopic Index"]
214+
)
215+
if not bad_mf:
216+
# Add the confidence score and isotopologue similarity and average MZ error score
217+
if "m/z Error Score" in dataframe:
218+
mfobj._mass_error_average_score = float(
219+
dataframe.iloc[df_index]["m/z Error Score"]
220+
)
221+
if "Confidence Score" in dataframe:
222+
mfobj._confidence_score = float(
223+
dataframe.iloc[df_index]["Confidence Score"]
224+
)
225+
if "Isotopologue Similarity" in dataframe:
226+
mfobj._isotopologue_similarity = float(
227+
dataframe.iloc[df_index]["Isotopologue Similarity"]
228+
)
229+
mass_spec_obj[ms_peak_index].add_molecular_formula(mfobj)
225230

226231

227232
class ReadMassList(MassListBaseClass):

corems/molecular_id/search/database_interfaces.py

Lines changed: 68 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -646,12 +646,14 @@ def __init__(self):
646646
super().__init__()
647647

648648
# API endpoint for precursor m/z search
649+
# inputs = mz, tolerance (in Da), polarity, page_no, per_page
649650
self.PRECURSOR_MZ_URL = (
650-
"https://metabref.emsl.pnnl.gov/api/precursors/m/{}/t/{}/{}"
651+
"https://metabref.emsl.pnnl.gov/api/precursors/m/{}/t/{}/{}?page={}&per_page={}"
651652
)
652653

653654
# API endpoint for returning full list of precursor m/z values in database
654-
self.PRECURSOR_MZ_ALL_URL = "https://metabref.emsl.pnnl.gov/api/precursors/{}"
655+
# inputs = polarity, page_no, per_page
656+
self.PRECURSOR_MZ_ALL_URL = "https://metabref.emsl.pnnl.gov/api/precursors/{}?page={}&per_page={}"
655657

656658
self.__init_format_map__()
657659

@@ -674,7 +676,7 @@ def __init_format_map__(self):
674676
self.format_map["fe"] = self.format_map["flashentropy"]
675677
self.format_map["flash-entropy"] = self.format_map["flashentropy"]
676678

677-
def query_by_precursor(self, mz_list, polarity, mz_tol_ppm, mz_tol_da_api=0.2):
679+
def query_by_precursor(self, mz_list, polarity, mz_tol_ppm, mz_tol_da_api=0.2, max_per_page=50):
678680
"""
679681
Query MetabRef by precursor m/z values.
680682
@@ -690,6 +692,8 @@ def query_by_precursor(self, mz_list, polarity, mz_tol_ppm, mz_tol_da_api=0.2):
690692
mz_tol_da_api : float, optional
691693
Maximum tolerance between precursor m/z values for API search, in daltons.
692694
Used to group similar mzs into a single API query for speed. Default is 0.2.
695+
max_per_page : int, optional
696+
Maximum records to return from MetabRef API query at a time. Default is 50.
693697
694698
Returns
695699
-------
@@ -705,7 +709,7 @@ def query_by_precursor(self, mz_list, polarity, mz_tol_ppm, mz_tol_da_api=0.2):
705709
mz_list.sort()
706710
mz_groups = [[mz_list[0]]]
707711
for x in mz_list[1:]:
708-
if abs(x - mz_groups[-1][-1]) <= mz_tol_da_api:
712+
if abs(x - mz_groups[-1][0]) <= mz_tol_da_api:
709713
mz_groups[-1].append(x)
710714
else:
711715
mz_groups.append([x])
@@ -722,32 +726,59 @@ def query_by_precursor(self, mz_list, polarity, mz_tol_ppm, mz_tol_da_api=0.2):
722726
tol = (max(mz_group) - min(mz_group)) / 2 + mz_tol_ppm**-6 * max(
723727
mz_group
724728
)
725-
lib = lib + self.get_query(
726-
self.PRECURSOR_MZ_URL.format(str(mz), str(tol), polarity)
729+
730+
# Get first page of results
731+
response = self.get_query(
732+
self.PRECURSOR_MZ_URL.format(str(mz), str(tol), polarity, 1, max_per_page)
727733
)
734+
lib = lib + response['results']
735+
736+
# If there are more pages of results, get them
737+
if response['total_pages'] > 1:
738+
for i in np.arange(2, response['total_pages']+1):
739+
lib = lib + self.get_query(
740+
self.PRECURSOR_MZ_URL.format(str(mz), str(tol), polarity, i, max_per_page)
741+
)['results']
728742

729743
return lib
730744

731-
def request_all_precursors(self, polarity):
745+
def request_all_precursors(self, polarity, per_page = 50000):
732746
"""
733-
Request all precursor m/z values from MetabRef.
747+
Request all precursor m/z values for MS2 spectra from MetabRef.
734748
735749
Parameters
736750
----------
737751
polarity : str
738752
Ionization polarity, either "positive" or "negative".
753+
per_page : int, optional
754+
Number of records to fetch per call. Default is 50000
739755
740756
Returns
741757
-------
742758
list
743-
List of all precursor m/z values.
759+
List of all precursor m/z values, sorted.
744760
"""
745761
# If polarity is anything other than positive or negative, raise error
746762
if polarity not in ["positive", "negative"]:
747763
raise ValueError("Polarity must be 'positive' or 'negative'")
748764

749-
# Query MetabRef for all precursor m/z values
750-
return self.get_query(self.PRECURSOR_MZ_ALL_URL.format(polarity))
765+
precursors = []
766+
767+
# Get first page of results and total number of pages of results
768+
response = self.get_query(self.PRECURSOR_MZ_ALL_URL.format(polarity, str(1), str(per_page)))
769+
total_pages = response['total_pages']
770+
precursors.extend([x['precursor_ion'] for x in response['results']])
771+
772+
# Go through remaining pages of results
773+
for i in np.arange(2, total_pages + 1):
774+
response = self.get_query(self.PRECURSOR_MZ_ALL_URL.format(polarity, str(i), str(per_page)))
775+
precursors.extend([x['precursor_ion'] for x in response['results']])
776+
777+
# Sort precursors from smallest to largest and remove duplicates
778+
precursors = list(set(precursors))
779+
precursors.sort()
780+
781+
return precursors
751782

752783
def get_lipid_library(
753784
self,
@@ -789,14 +820,25 @@ def get_lipid_library(
789820
790821
"""
791822
mz_list.sort()
823+
mz_list = np.array(mz_list)
792824

793825
# Get all precursors in the library matching the polarity
794826
precusors_in_lib = self.request_all_precursors(polarity=polarity)
795-
precusors_in_lib.sort()
796827
precusors_in_lib = np.array(precusors_in_lib)
797828

798829
# Compare the mz_list with the precursors in the library, keep any mzs that are within mz_tol of any precursor in the library
799-
mz_list = np.array(mz_list)
830+
lib_mz_df = pd.DataFrame(precusors_in_lib, columns=["lib_mz"])
831+
lib_mz_df["closest_obs_mz"] = mz_list[
832+
find_closest(mz_list, lib_mz_df.lib_mz.values)
833+
]
834+
lib_mz_df["mz_diff_ppm"] = np.abs(
835+
(lib_mz_df["lib_mz"] - lib_mz_df["closest_obs_mz"])
836+
/ lib_mz_df["lib_mz"]
837+
* 1e6
838+
)
839+
lib_mz_sub = lib_mz_df[lib_mz_df["mz_diff_ppm"] <= mz_tol_ppm]
840+
841+
# Do the same in the opposite direction
800842
mz_df = pd.DataFrame(mz_list, columns=["mass_feature_mz"])
801843
mz_df["closest_lib_pre_mz"] = precusors_in_lib[
802844
find_closest(precusors_in_lib, mz_df.mass_feature_mz.values)
@@ -808,9 +850,15 @@ def get_lipid_library(
808850
)
809851
mz_df_sub = mz_df[mz_df["mz_diff_ppm"] <= mz_tol_ppm]
810852

853+
# Evaluate which is fewer mzs - lib_mz_sub or mz_df_sub and use that as the input for next step
854+
if len(lib_mz_sub) < len(mz_df_sub):
855+
mzs_to_query = lib_mz_sub.lib_mz.values
856+
else:
857+
mzs_to_query = mz_df_sub.mass_feature_mz.values
858+
811859
# Query the library for the precursors in the mz_list that are in the library to retrieve the spectra and metadata
812860
lib = self.query_by_precursor(
813-
mz_list=mz_df_sub.mass_feature_mz.values,
861+
mz_list=mzs_to_query,
814862
polarity=polarity,
815863
mz_tol_ppm=mz_tol_ppm,
816864
mz_tol_da_api=mz_tol_da_api,
@@ -830,6 +878,12 @@ def get_lipid_library(
830878
{k: v for k, v in x.items() if k not in ["Molecular Data", "Lipid Tree"]}
831879
for x in lib
832880
]
881+
# Unpack the 'Lipid Fragments' key and the 'MSO Data" key from each entry
882+
for x in lib:
883+
if "Lipid Fragments" in x.keys():
884+
x.update(x.pop("Lipid Fragments"))
885+
if "MSO Data" in x.keys():
886+
x.update(x.pop("MSO Data"))
833887

834888
# Format the spectral library
835889
format_func = self._get_format_func(format)

corems/molecular_id/search/lcms_spectral_search.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ def get_more_match_quals(
8888
)
8989

9090
# Get types of fragments in the lib entry
91-
lib_frags = lib_entry["fragment_types"].split(", ")
91+
lib_frags = lib_entry["fragment_types"]
9292
# make list of the fragment types that are present in the query spectrum
9393
lib_in_query_ids = list(
9494
set(

0 commit comments

Comments
 (0)