Skip to content

Commit 0b6bb72

Browse files
committed
Adding DSC180 content
1 parent 8732751 commit 0b6bb72

9 files changed

Lines changed: 243 additions & 1 deletion

_config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ footer:
148148
# url:
149149
- label: "GitHub"
150150
icon: "fab fa-fw fa-github"
151-
url: "https://github.com/CompClimate"
151+
url: "https://github.com/climate-analytics-lab"
152152
- label: "GitLab"
153153
icon: "fab fa-fw fa-gitlab"
154154
# url:

_pages/dsc_180.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
permalink: /dsc_180/
3+
title: "Deep Learning for Climate Model Emulation"
4+
layout: single
5+
toc: true
6+
toc_sticky: true
7+
---
8+
### Data Science Capstone Domain - DSC 180AB
9+
### Section B03 (TA: Yanyi)
10+
11+
12+
## Introduction to Topic
13+
14+
The choices humanity makes in the next few decades will determine how much warmer the Earth will be by the end of the century, with implications for billions of lives and trillions of dollars in GDP. Many different emission pathways exist that are compatible with the Paris climate agreement, and many more are possible that miss that target. While some of the most complex climate models have simulated a small selection of these, it is impractical to use these computationally expensive models to fully explore the space of possibilities or assess all the associated risks. Our lab has recently developed state-of-the-art climate model emulators to enable fast, accurate and reliable predictions for any given scenario (https://github.com/duncanwp/ClimateBench).
15+
16+
17+
## Phase I - Replication
18+
19+
The aim of reproducing a paper's results is to affirm the original authors' findings and methodologies. This process is vital in science to ensure results are robust and reliable, not merely due to chance or error. Reproduction would reinforce the evidence that the constructed emulators are faithfully reproducing the underlying climate model and can be trusted for such tasks. It also provides a deeper understanding of the applied methods like long short-term memory networks. Ultimately, this endeavor seeks to enable fast and efficient sampling of different climate scenarios to improve decision making.
20+
21+
The paper, linked here, we will be working with is:
22+
> **Watson-Parris, D.**, Rao, Y., Olivié, D., Seland, Ø., ... "ClimateBench v1.0: A benchmark for data-driven climate projections". *Journal of Advances in Modeling Earth Systems 14, e2021MS002954*: <https://doi.org/10.1029/2021MS002954>
23+
24+
25+
### Accessing the ClimateBench Dataset
26+
27+
While the processed dataset is publically available, it will be instructive for you to generate it yourselves, and an important part of the replication process. You will be provided with access to [Casper](https://arc.ucar.edu/knowledge_base/70549550) data analysis cluster at the National Center for Atmospheric Research (NCAR) with sufficient resources to perform the analyses throughout the project. Please note this is a national facility with shared resources so be mindful of your requests and be sure to abide by their rules.
28+
29+
The data we will use is available from the sixth Coupled Model Intercomparison Project (CMIP6) which represents the combined efforts of dozens of international research laboratories running hundreds of thousands of simulation years of experiments. The data (all 30 petabytes!) is publically archived and available e.g. here: https://esgf-index1.ceda.ac.uk/projects/esgf-ceda/, and also recently mirrored to the cloud here: https://registry.opendata.aws/cmip6/. Fortunately, all the data you will need is already available on Casper so you shouldn't need to download any large datasets, which can be quite cumbersome.
30+
31+
### Schedule
32+
33+
Click the "topic" links below for details regarding the readings, questions, and tasks for that week.
34+
35+
| Week | Topic |
36+
| --- | --- |
37+
| Summer | [Summer preperation](dsc_180_summer) |
38+
| 1 | [Introduction to topic, domain, and paper](dsc_180_intro) |
39+
| 2-3 | [Dive into the ClimateBench dataset](dsc_180_data) |
40+
| 4-5 | [Begin data preprocessing and learn about xarray](dsc_180_xarray) |
41+
| 6-7 | [Start implementing regression models](dsc_180_implement) |
42+
| 8-9 | [Perform validation and testing of baselines](dsc_180_validate) |
43+
| 10 | [Project wrap up and debrief](dsc_180_debrief) |
44+
45+
46+
47+
## Phase II
48+
49+
TBD.
50+
51+
Possible options for extension include:
52+
- Incorporating multiple climate models at different levels of fidelity
53+
- Improving the regression models or dimensionality reduction approaches, perhaps using graph-based regulariztion
54+
- Extending the variables that are emulated
55+
56+
## Section & Group Participation
57+
58+
Participation in the weekly discussion section is mandatory. Each week, during phase 1, you are responsible for doing the reading/task assigned in the schedule and submitting answers to the listed questions before discussion section begins.
59+
60+
Weekly assigned questions help me to observe how you are all doing on the project, as well as to focus your work for the week and help prepare you for discussion. If you have questions about your work, please ask them in section, on discord, or in office hours (I will rarely comment on your submission answers).
61+
62+
63+
## Office Hours
64+
65+
Wednesdays 10-11am in HDSI room 325.
66+
67+
You're also welcome to join our group meetings on Mondays at 1pm in Nierenberg Hall room 432 at SIO.

_pages/dsc_180_data.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
permalink: /dsc_180_data/
3+
title: "DSC 180ab B03 - Dive into the ClimateBench dataset"
4+
layout: single
5+
---
6+
7+
Let's get hands-on with the data!
8+
9+
### Topics
10+
11+
You will be touching on the following topics.
12+
13+
- Exploring the CMIP6 data and processing the NorESM2 output
14+
- Processing the input4MIPS emissions data used to drive the models
15+
16+
17+
### Tasks
18+
19+
- Process the NorESM2 data for each of the required experiments. The data can be found in `/glade/collections/cmip/CMIP6/{activity}/NCC/NorESM2-LM/{experiment}`
20+
- Process the input data. The data can be found in `/glade/p/cesmdata/cseg/inputdata/atm/cam/chem/emis/`
21+
- You will need to take the global averages of some quantities and regrid others
22+
23+
### Questions
24+
25+
Make sure your presentation is polished and answers the following questions
26+
27+
1. Could your group successfully replicate the findings of the paper, even if only partially? Provide context on the factors that contributed to your success or failure in replication. For instance, did you find any disagreement with the methodology used in the paper?
28+
2. Please present the final visualizations created during the course of this project.
29+
3. Summarize the key concepts and topics that you gained insights into through this project.
30+
4. Engage in brainstorming to come up with potential ideas for further expanding this research, or propose innovative projects related to sepsis utilizing the MIMIC-III/IV dataset.

_pages/dsc_180_debrief.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
permalink: /dsc_180_debrief/
3+
title: "DSC 180ab B03 - Project wrap up and debrief"
4+
layout: single
5+
---
6+
7+
Wrap up week! Great job everyone.
8+
9+
### Topics
10+
11+
You will be touching on the following topics.
12+
13+
- Making a final determination on replication
14+
- Presenting your findings
15+
16+
17+
### Tasks
18+
19+
- Create a final replication presentation, this should be comprehensive and can be built offf of the presentation your group gave a few weeks back.
20+
- Brainstorm ideas for Phase II
21+
- Enjoy cupcakes during the final section meeting/presentation! On me.
22+
23+
### Final Presentation
24+
25+
Make sure your presentation is polished and answers the following questions
26+
27+
1. Could your group successfully replicate the findings of the paper, even if only partially? Provide context on the factors that contributed to your success or failure in replication. For instance, did you find any disagreement with the methodology used in the paper?
28+
2. Please present the final visualizations created during the course of this project.
29+
3. Summarize the key concepts and topics that you gained insights into through this project.
30+
4. Engage in brainstorming to come up with potential ideas for further expanding this research, or propose innovative projects related to sepsis utilizing the MIMIC-III/IV dataset.

_pages/dsc_180_implement.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
permalink: /dsc_180_implement/
3+
title: "DSC 180ab B03 - Start implementing regression models"
4+
layout: single
5+
---
6+
7+
8+
### Topics
9+
10+
You will be touching on the following topics:
11+
12+
-
13+
14+
### Tasks
15+
16+
-
17+
18+
### Questions
19+
20+
1.

_pages/dsc_180_intro.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
---
2+
permalink: /dsc_180_intro/
3+
title: "DSC 180ab B03 - Introduction to topic, domain, and paper"
4+
layout: single
5+
toc: true
6+
toc_sticky: true
7+
---
8+
9+
10+
Welcome, Welcome, Welcome!
11+
12+
Week 1 is upon us and we are going to spend it getting everyone situated, goiong over the primary paper, and beginning to read up on the domain of climate model simulation and emulation
13+
14+
### Topics
15+
16+
You will be touching on these topics over the first week
17+
18+
- What is sepsis, how it can be diagnosed/treated and why it is very deadly from an epidemiological point of view
19+
- Severity of illness scores
20+
- EHR data
21+
- Reproducibility/replicability in data science
22+
23+
### Tasks
24+
25+
- Gain access to the MIMIC-III and MIMIC-IV dataset, see instructions on home pagecritical
26+
- Complete all assigned readings
27+
- Begin to familairize yourself with the domain by thumbing through the Domain Expertise page.
28+
- Submit question answers to the online form (linked below)
29+
- Join the class Discord Channel: TBD
30+
31+
### Readings
32+
33+
- Read the capstone primary paper by Zador et al. linked on the home pagecritical
34+
- MIMIC-III paper by Johnson et al. link - if you already did not finish it
35+
- MIMIC-III web description link - if you already did not finish it
36+
37+
### Questions
38+
39+
Answer the following questions using this google form link
40+
41+
1. What are the primary goals of the research paper we aim to replicate? Additionally, what do you anticipate to be the major hurdles in this process? Lastly, define what outcomes would you consider as a successful completion of Phase I in this capstone project.
42+
2. Provide a detailed explanation of what an Electronic Health Record (EHR) is. How does EHR relate to and integrate with the MIMIC datasets?
43+
3. Explain the concept and utility of Illness Scores in healthcare. Why do multiple illness scores exist? Select one illness score from the following - OASIS, SAPS II, or SOFA. Write a brief overview, including its applications and key attributes. Note: The aim is for each team member to gain expertise in at least one illness score, so ensure that all three scores are explored by different members.
44+

_pages/dsc_180_summer.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
permalink: /dsc_180_summer/
3+
title: "DSC 180ab B03 - Summer tasks"
4+
layout: single
5+
---
6+
7+
I don't expect much over the summer but the following activities would be useful and allow you to hit the ground running in Fall:
8+
9+
- Skim the latest UN Intergovernmental Panel on Climate Change Synthesis Report to get a summary of the latest climate change science, especially the figures: https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_SPM.pdf
10+
- Read the ClimateBench paper: https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954
11+
- Try out the xarray python library for working with climate data: https://docs.xarray.dev/en/stable/

_pages/dsc_180_validate.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
permalink: /dsc_180_validate/
3+
title: "DSC 180ab B03 - Perform validation and testing of baselines"
4+
layout: single
5+
---
6+
7+
8+
### Topics
9+
10+
You will be touching on the following topics:
11+
12+
-
13+
14+
### Tasks
15+
16+
-
17+
18+
### Questions
19+
20+
1.

_pages/dsc_180_xarray.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
permalink: /dsc_180_xarray/
3+
title: "DSC 180ab B03 - Begin data preprocessing and learn about xarray"
4+
layout: single
5+
---
6+
7+
8+
### Topics
9+
10+
You will be touching on the following topics:
11+
12+
-
13+
14+
### Tasks
15+
16+
-
17+
18+
### Questions
19+
20+
1.

0 commit comments

Comments
 (0)