Skip to content

Commit ffa5134

Browse files
committed
Add Lines and Models to spe_filter and patch text
1 parent a1a5ed4 commit ffa5134

1 file changed

Lines changed: 45 additions & 29 deletions

File tree

tutorials/parquet-catalog-demos/euclid-hats-parquet.md

Lines changed: 45 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -393,33 +393,48 @@ pp_df = pp_df.set_index(OBJECT_ID).sort_index()
393393
pp_final_filter = pp_df["log10_ssfr"] < -8.2
394394
```
395395

396-
Load a quality SPE sample. Cuts are from Le Brun sec. 3.3.
396+
Load a quality SPE sample. Cuts are from Le Brun sec. 3.3 and 6.2.
397397

398398
The NISP instrument was built to target Halpha emitting galaxies, which effectively means 0.9 < z < 1.8.
399-
SPE redshifts are reliable in that regime, but this represents <2% of the total delivered by the SPE pipeline.
400-
So it's crucial to make cuts in order to get it.
399+
SPE redshifts are reliable in that regime.
400+
However, this represents <2% of the total delivered by the SPE pipeline, so it's crucial to make cuts in order to get it.
401401

402402
```{code-cell}
403403
# SPE probability of the rank 0 (best) redshift estimate, assuming galaxy.
404404
SPE_GAL_Z_PROB = "Z_GALAXY_CANDIDATES_SPE_Z_PROB_RANK0"
405+
# [TODO] describe
406+
HALPHA_LINE_FLUX = "LINES_SPE_LINE_FLUX_GF_RANK0_Halpha"
407+
HALPHA_LINE_SNR = "LINES_SPE_LINE_SNR_GF_RANK0_Halpha"
408+
EMISSION_LINE_WIDTH = "MODELS_GALAXY_SPE_VEL_DISP_E_RANK0"
405409
406410
# Columns we actually want to load.
407411
spe_columns = [SPE_GAL_Z, OBJECT_ID]
408412
409-
# Partial filter for quality SPE galaxy redshifts.
410-
spe_filter = pc.field(SPE_GAL_Z_PROB) > 0.99
411-
# [FIXME] & (pc.field(linewidth) < 680). Add when Halpha line is added to the catalog.
412-
# Later, cut to z target range and
413-
# sec. 6.2: Halpha flux > 2e-16 erg /s/cm2 and SN>3.5
413+
# Filter for quality SPE galaxy redshifts.
414+
spe_filter = (
415+
# Euclid's target redshift range.
416+
(pc.field(SPE_GAL_Z) > 0.9)
417+
& (pc.field(SPE_GAL_Z) < 1.8)
418+
# MER quality
419+
& (pc.field(SPURIOUS_FLAG) == 0)
420+
& (pc.field("MER_DET_QUALITY_FLAG") < 4)
421+
# High quality SPE galaxies.
422+
& (pc.field(SPE_GAL_Z_PROB) > 0.99) # [FIXME] Andreas says > 0.999. Also mentioned in sec. 6.2.
423+
& (pc.field(EMISSION_LINE_WIDTH) < 680)
424+
# Halpha emitters.
425+
& (pc.field(HALPHA_LINE_FLUX) > 2e-16)
426+
& (pc.field(HALPHA_LINE_SNR) > 3.5) # [FIXME] Tiffany's notebook uses > 5.
427+
# These make no difference
428+
# & (pc.field("LINES_SPE_LINE_N_DITH_RANK0_Halpha") >= 3) # all values in this column = 0
429+
# & (pc.field("Z_SPE_N_DITH_MED") >= 3)
430+
# & (pc.field("Z_SPE_ERROR_FLAG") == 0)
431+
# & (pc.field("Z_SPE_GAL_ERROR_FLAG") == 0)
432+
)
414433
415434
# Execute the filter and load.
416435
spe_df = dataset.to_table(columns=spe_columns, filter=spe_filter).to_pandas()
417436
spe_df = spe_df.set_index(OBJECT_ID).sort_index()
418437
# 27s
419-
420-
# Final filter, to be applied later. Objects within target redshift range.
421-
spe_final_filter = (spe_df[SPE_GAL_Z] > 0.9) & (spe_df[SPE_GAL_Z] < 1.8)
422-
# [FIXME] Add more when the columns are available.
423438
```
424439

425440
Plot redshift distributions
@@ -445,22 +460,19 @@ ax.hist(pp_df.loc[pp_final_filter, PHYSPARAM_GAL_Z], **pp_kwargs, **hist_kwargs)
445460
# SPE
446461
spe_kwargs = dict(label=SPE_GAL_Z + " (filtered)", color=tbl_colors["SPE_GAL"], linestyle=":")
447462
ax.hist(spe_df[SPE_GAL_Z], **spe_kwargs, **hist_kwargs)
448-
# Impose our final cuts.
449-
spe_kwargs.update(label=SPE_GAL_Z + " (quality)", linestyle="-")
450-
ax.hist(spe_df.loc[spe_final_filter, SPE_GAL_Z], **spe_kwargs, **hist_kwargs)
451463
452464
ax.set_xlabel("Redshift")
453465
ax.set_ylabel("Counts")
454466
plt.legend()
455467
```
456468

457-
The orange distribution is a quality sample of the redshifts (best point estimates) generated for cosmology by a Bayesian template-fitting code.
469+
The orange distribution is a quality sample of the redshifts (best point estimate) generated for cosmology by a Bayesian template-fitting code.
458470
The maximum is z ~ 6, due to the model's input parameters.
459471
Green represents redshifts that were generated to study galaxies' physical properties by a supervised learning, k-nearest neighbors algorithm.
460-
The maximum is z ~ 7, again due to model input parameters.
472+
The maximum is z ~ 7, again due to model inputs.
461473
Several quality cuts were applied to produce the dotted-line sample, but this still includes a population of problematic galaxies for which the solutions pointed to unrealistically young ages and very high specific star formation rates.
462474
The green solid line filters those out and represents a quality sample for this redshift estimate.
463-
Purple represents the spectroscopic redshifts (best point estimates)
475+
Purple represents the spectroscopic redshifts (best point estimate).
464476
The dotted line has been filtered for reliable (SPE) galaxy solutions and the maximum is z ~ 5.
465477
There is a clear bump between about 0.9 < z < 1.8 which results from a combination of the NISP instrument parameters (tuned to detect Halpha) and a model prior that strongly favored solutions in this regime.
466478
However much more drastic cuts are needed to obtain a trustworthy sample.
@@ -489,7 +501,7 @@ Compare PHZ to PHYSPARAM.
489501
Here, we reproduce Tucci Fig. 17 (left panel) except that we don't consider the problematic galaxies nor do we impose cuts on magnitude or region (EDF-F).
490502

491503
```{code-cell}
492-
# Get the common objects and set axes data x (PHZ) and y (PHYSPARAM).
504+
# Get the common objects and set axes data (PHZ on x, PHYSPARAM on y).
493505
phz_pp_df = phz_df.join(pp_df.loc[pp_final_filter], how="inner", lsuffix="phz", rsuffix="pp")
494506
x, y = phz_pp_df[PHZ_Z], phz_pp_df[PHYSPARAM_GAL_Z]
495507
one_to_one_linspace = np.linspace(-0.01, 6, 100)
@@ -513,8 +525,8 @@ The two outlier clouds are very roughly similar to those in Tucci Fig. 7 which w
513525
Compare PHZ to SPE
514526

515527
```{code-cell}
516-
# Get the common objects and set axes data x (PHZ) and y (SPE).
517-
phz_spe_df = phz_df.join(spe_df.loc[spe_final_filter], how="inner", lsuffix="phz", rsuffix="spe")
528+
# Get the common objects and set axes data (PHZ on x, SPE on y).
529+
phz_spe_df = phz_df.join(spe_df, how="inner", lsuffix="phz", rsuffix="spe")
518530
x, y = phz_spe_df[PHZ_Z], phz_spe_df[SPE_GAL_Z]
519531
one_to_one_linspace = np.linspace(0.89, 1.81, 100)
520532
@@ -751,7 +763,7 @@ plt.tight_layout()
751763

752764
The template - aperture magnitude difference is fairly tightly clustered around 0 for extended objects (top row) but the outliers are asymmetric (fractions above and below zero are noted).
753765
We see a positive offset which indicates a fainter template-fit magnitude, as we should expect given that the templates do a better job of excluding contaminating light from nearby sources.
754-
The offset is more pronounced for point-like objects, likely due to the PSF handling mentioned above, and we are reminded that aperture magnitudes are more reliable here.
766+
The offset is more pronounced for point-like objects (bottom row), likely due to the PSF handling mentioned above, and we are reminded that aperture magnitudes are more reliable here.
755767

756768
+++
757769

@@ -861,14 +873,14 @@ nironly_nonspurious_filter = (pc.field(VIS_DET) == 0) & (pc.field(SPURIOUS_FLAG)
861873
```
862874

863875
NIR-only objects are a mix of nearby brown dwarfs and high-redshift galaxies & quasars.
864-
These two broad, but very different, types of objects overlap in relevant color spaces and can be difficult to separate.
876+
These two broad but very different types of objects overlap in relevant color spaces and can be difficult to separate.
865877
(Weaver et al., 2024).
866878
Spectra will often be required to confirm membership, but we can use photometric properties produced by PHZ to make some useful cuts first.
867879
We'll track the following three objects to illustrate:
868880

869-
- OBJECT_ID: -523574860290315045. T4 dwarf, discovered spectroscopically ([Dominguez-Tagle et al., 2025](https://arxiv.org/abs/2503.22442)).
870-
- OBJECT_ID: -600367386508373277. L-type dwarf, spectroscopically confirmed ([Zhang, Lodieu, and Martín, 2024](https://arxiv.org/abs/2403.15288) Table C.2. '04:00:08.99 −50:50:14.4'. Found in Q1 via cone search; separation = 1.6 arcsec).
871-
- OBJECT_ID: -531067351279302418. Star-forming galaxy at z=5.78, spectroscopically confirmed ([Bunker et al., 2003](https://arxiv.org/abs/astro-ph/0302401). Found in Q1 via cone search; separation = 0.59 arcsec).
881+
- OBJECT_ID: -523574860290315045. **T4 dwarf**, discovered spectroscopically ([Dominguez-Tagle et al., 2025](https://arxiv.org/abs/2503.22442)).
882+
- OBJECT_ID: -600367386508373277. **L-type dwarf**, spectroscopically confirmed ([Zhang, Lodieu, and Martín, 2024](https://arxiv.org/abs/2403.15288) Table C.2. '04:00:08.99 −50:50:14.4'. Found in Q1 via cone search; separation = 1.6 arcsec).
883+
- OBJECT_ID: -531067351279302418. **Star-forming galaxy at z=5.78**, spectroscopically confirmed ([Bunker et al., 2003](https://arxiv.org/abs/astro-ph/0302401). Found in Q1 via cone search; separation = 0.59 arcsec).
872884

873885
```{code-cell}
874886
targets = {
@@ -972,7 +984,9 @@ targets_columns = [
972984
# "Z_QSO_CANDIDATES_SPE_PDF_ZMAX_RANK0",
973985
# "Z_QSO_CANDIDATES_SPE_PDF_DELTAZ_RANK0",
974986
]
987+
```
975988

989+
```{code-cell}
976990
# Load data.
977991
targets_filter = pc.field(OBJECT_ID).isin(targets.keys())
978992
targets_df = dataset.to_table(columns=targets_columns, filter=targets_filter).to_pandas()
@@ -1023,7 +1037,7 @@ for ax, (target_id, (target_name, target_color)) in zip(axes, targets.items()):
10231037

10241038
In the left panel (T dwarf), we see that the photo-z PDF produced by the NIR-only branch is very strongly peaked at z=0 with a small secondary bump near z=7, consistent with its placement in the previous figure.
10251039
Recall that PHZ_PHZ_PDF was produced using galaxy models, regardless of the object's class.
1026-
In the middle panel (L dwarf), we see the multi-peaked NIR PDF that was guessed at based on the previous figure.
1040+
In the middle panel (L dwarf), we see the broad and multi-peaked NIR PDF that was guessed at based on the previous figure.
10271041
While the strongest peak is near z=8 (a QSO solution, perhaps?), there are also peaks at z=0 (consistent with a star) and near z=1.75 (consistent with the PHZ (galaxy) solution) which are prominent enough to reduce the probability of z>6 below the 0.8 threshold.
10281042
In the right panel (Galaxy), we see good agreement between the PDFs except at z=0.
10291043

@@ -1054,7 +1068,7 @@ s3_filesystem = pyarrow.fs.S3FileSystem()
10541068
schema = pyarrow.parquet.read_schema(euclid_parquet_schema_path, filesystem=s3_filesystem)
10551069
```
10561070

1057-
There are more than 1300 columns in this dataset.
1071+
There are almost 1600 columns in this dataset.
10581072

10591073
```{code-cell}
10601074
print(f"{len(schema)} columns total")
@@ -1069,6 +1083,7 @@ To find all columns from a given table, search for column names that start with
10691083
```{code-cell}
10701084
# Find all column names from the PHZ table.
10711085
phz_columns = [name for name in schema.names if name.startswith("PHZ_")]
1086+
10721087
print(f"{len(phz_columns)} columns from the PHZ table. First four are:")
10731088
phz_columns[:4]
10741089
```
@@ -1085,6 +1100,7 @@ They are given in microjanskys, so all flux columns can be found by searching th
10851100
```{code-cell}
10861101
# Find all flux columns.
10871102
flux_columns = [field.name for field in schema if field.metadata[b"unit"] == b"uJy"]
1103+
10881104
print(f"{len(flux_columns)} flux columns. First four are:")
10891105
flux_columns[:4]
10901106
```
@@ -1125,6 +1141,6 @@ schema.names[-5:]
11251141

11261142
**Authors:** Troy Raen (Developer; Caltech/IPAC-IRSA) and the IRSA Data Science Team.
11271143

1128-
**Updated:** 2025-06-16
1144+
**Updated:** 2025-06-29
11291145

11301146
**Contact:** [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions or problems.

0 commit comments

Comments
 (0)