You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tutorials/parquet-catalog-demos/euclid-hats-parquet.md
+74-70Lines changed: 74 additions & 70 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,6 +13,10 @@ kernelspec:
13
13
14
14
# Euclid Q1 Catalogs in HATS Parquet
15
15
16
+
This notebook introduces the [Euclid Q1](https://irsa.ipac.caltech.edu/data/Euclid/docs/overview_q1.html) HATS Collection served by IPAC/IRSA and demonstrates access with python.
17
+
18
+
+++
19
+
16
20
## Learning Goals
17
21
18
22
By the end of this tutorial, you will:
@@ -23,9 +27,8 @@ By the end of this tutorial, you will:
23
27
24
28
+++
25
29
26
-
## Introduction
30
+
## 1. Introduction
27
31
28
-
This notebook introduces the [Euclid Q1](https://irsa.ipac.caltech.edu/data/Euclid/docs/overview_q1.html) HATS Collection served by IPAC/IRSA and demonstrates access with python.
29
32
The Collection includes a HATS Catalog (main data product), Margin Cache (10 arcsec), and Index Table (OBJECT_ID).
30
33
The Catalog includes the twelve Euclid Q1 tables listed below, joined on the column 'OBJECT_ID' into a single Parquet dataset with 1,329 columns (one row per Euclid MER Object).
31
34
Among them, Euclid has provided several different redshift measurements, several flux measurements for each Euclid band, and flux measurements for bands from several ground-based observatories -- in addition to morphological and other measurements.
@@ -34,12 +37,12 @@ These were produced for different science goals using different algorithms and/o
34
37
Having all columns in the same dataset makes access convenient because the user doesn't have to make separate calls for data from different tables and/or join the results.
35
38
However, figuring out which, e.g., flux measurements to use amongst so many can be challenging.
36
39
In the sections below, we look at some of their distributions and reproduce figures from several papers in order to highlight some of the options and point out their differences.
37
-
The Appendix contains important information about the schema of this Parquet dataset, especially the syntax of the column names.
40
+
The Appendix explains how the columns in this Parquet dataset are named and organized.
38
41
For more information about the meaning and provenance of a column, refer to the links provided with the list of tables below.
39
42
40
-
### Euclid Q1 tables and docs
43
+
### 1.1 Euclid Q1 tables and docs
41
44
42
-
The Euclid Q1 HATS Catalog includes the following twelve Q1 tables[*], which are organized underneath the Euclid processing function (MER, PHZ, or SPE) that created it.
45
+
The Euclid Q1 HATS Catalog includes the following twelve Q1 tables, which are organized underneath the Euclid processing function (MER, PHZ, or SPE) that created it.
43
46
Links to the Euclid papers describing the processing functions are provided, as well as pointers for each table.
44
47
Table names are linked to their original schemas.
45
48
@@ -65,9 +68,7 @@ See also:
65
68
-[Frequently Asked Questions About Euclid Q1 data](https://euclid.caltech.edu/page/euclid-q1-data-faq) (hereafter, FAQ)
[*] Euclid typically calls these "catalogs", but this notebook uses "tables" to avoid any confusion with the HATS Catalog product.
69
-
70
-
### Parquet, HEALPix, and HATS
71
+
### 1.2 Parquet, HEALPix, and HATS
71
72
72
73
Parquet, HEALPix, and HATS are described in more detail at [https://irsadev.ipac.caltech.edu:9051/cloud_access/parquet/](https://irsadev.ipac.caltech.edu:9051/cloud_access/parquet/).
73
74
([FIXME] Currently requires IPAC VPN. Update url when the page is published to ops.)
@@ -93,7 +94,7 @@ In brief:
93
94
94
95
+++
95
96
96
-
## Installs and imports
97
+
## 2. Installs and imports
97
98
98
99
+++
99
100
@@ -109,18 +110,21 @@ We rely on ``lsdb>=0.5.2``, ``hpgeom>=1.4``, ``numpy>=2.0``, and ``pyerfa>=2.0.1
109
110
```
110
111
111
112
```{code-cell}
112
-
import os # Determine number of CPUs (for parallelization)
PHZ classifications. These were generated by a probabilistic random forest supervised ML algorithm.
237
257
238
-
# PHZ classifications were generated by a probabilistic random forest supervised ML algorithm.
258
+
```{code-cell}
239
259
PHZ_CLASS = "PHZ_PHZ_CLASSIFICATION"
240
260
PHZ_CLASS_MAP = {
241
261
1: "Star",
@@ -251,28 +271,19 @@ PHZ_CLASS_MAP = {
251
271
}
252
272
```
253
273
254
-
### 1.5 Euclid Deep Fields
274
+
### 3.5 Euclid Deep Fields
255
275
256
276
+++
257
277
278
+
[FIXME] The notebook does not currently use these. Should either use them or remove them.
279
+
258
280
Euclid Q1 includes data from three Euclid Deep Fields: EDF-N (North), EDF-S (South), EDF-F (Fornax; also in the southern hemisphere).
259
281
There is also a small amount of data from a fourth field: LDN1641 (Lynds' Dark Nebula 1641), which was observed for technical reasons during Euclid's verification phase and mostly ignored here.
260
-
There are two notable differences between regions:
261
-
262
-
- EDF-N is closest to the galactic plane and thus contains a larger fraction of stars.
263
-
- Different external data was available in EDF-N (DES with g, r, i, and z bands) vs EDF-S+F (UNIONS with u, g, r, i, and z bands -- UNIONS is a collaboration between CFIS, Pan-STARRS, HSC, WHIGS, and WISHES).
264
-
The Euclid processing pipelines used the external data to supplement Euclid data to, for example, measure colors that were then used for PHZ classifications.
265
-
Differences between the available data is the cause of various differences in pipeline handling and results.
266
-
267
-
The EDF regions are well separated, so we can distinguish them using a simple cone search without having to be too picky about the radius.
282
+
The regions are well separated, so we can distinguish them using a simple cone search without having to be too picky about the radius.
268
283
Rather than using the RA and Dec values directly, we'll find a set of HEALPix order 9 pixels that cover each area.
269
284
A column ('_healpix_9') of order 9 indexes was added to the catalog for this purpose.
270
285
These will suffice for a simple and efficient cone search.
271
286
272
-
[FIXME] The notebook does not currently use these but it might be good to do so.
273
-
Maybe in the Magnitudes section to show the differences as a function of class.
# Plot point-like morphology vs brightness as a function of class.
565
-
# Here, we reproduce the first three panels of Tucci Fig. 6, combining top and bottom.
566
567
```
567
568
569
+
Plot point-like morphology vs brightness as a function of class.
570
+
Here, we reproduce the first three panels of Tucci Fig. 6, combining top and bottom.
571
+
568
572
```{code-cell}
569
573
fig, axes = plt.subplots(1, 3, figsize=(20, 6))
570
574
for ax, (class_name, class_df) in zip(axes, classes_df.groupby(PHZ_CLASS)):
@@ -579,7 +583,8 @@ for ax, (class_name, class_df) in zip(axes, classes_df.groupby(PHZ_CLASS)):
579
583
ax.set_ylim(15, 27)
580
584
```
581
585
582
-
Objects to the left of the vertical line are point-like.
586
+
MER_MUMAX_MINUS_MAG is the peak surface brightness above the background minus the magnitude that was used to compute MER_POINT_LIKE_PROB.
587
+
Objects to the left of the vertical line (<-2.5) are point-like.
583
588
Stars are highly concentrated there, especially those that are not faint (I < 24.5), which we should expect given Euclid's requirement for a pure sample.
584
589
Also as we should expect, most galaxies appear to the right of this line.
585
590
However, notice the strip of bright (e.g., I < 23) "galaxies" that are point-like.
@@ -591,7 +596,7 @@ Many QSOs are likely to be missing from the expected region due to the overlap o
0 commit comments