Adding DSC180 content

duncanwp · duncanwp · commit 0b6bb72c4be9 · 2023-10-03T17:51:25.000-07:00
diff --git a/_config.yml b/_config.yml
@@ -148,7 +148,7 @@ footer:
       # url:
     - label: "GitHub"
       icon: "fab fa-fw fa-github"
-      url: "https://github.com/CompClimate"
+      url: "https://github.com/climate-analytics-lab"
     - label: "GitLab"
       icon: "fab fa-fw fa-gitlab"
       # url:
diff --git a/_pages/dsc_180.md b/_pages/dsc_180.md
@@ -0,0 +1,67 @@
+---
+permalink: /dsc_180/
+title: "Deep Learning for Climate Model Emulation"
+layout: single
+toc: true
+toc_sticky: true
+---
+### Data Science Capstone Domain - DSC 180AB
+### Section B03 (TA: Yanyi)
+
+
+## Introduction to Topic
+
+The choices humanity makes in the next few decades will determine how much warmer the Earth will be by the end of the century, with implications for billions of lives and trillions of dollars in GDP. Many different emission pathways exist that are compatible with the Paris climate agreement, and many more are possible that miss that target. While some of the most complex climate models have simulated a small selection of these, it is impractical to use these computationally expensive models to fully explore the space of possibilities or assess all the associated risks. Our lab has recently developed state-of-the-art climate model emulators to enable fast, accurate and reliable predictions for any given scenario (https://github.com/duncanwp/ClimateBench). 
+
+
+## Phase I - Replication
+
+The aim of reproducing a paper's results is to affirm the original authors' findings and methodologies. This process is vital in science to ensure results are robust and reliable, not merely due to chance or error. Reproduction would reinforce the evidence that the constructed emulators are faithfully reproducing the underlying climate model and can be trusted for such tasks. It also provides a deeper understanding of the applied methods like long short-term memory networks. Ultimately, this endeavor seeks to enable fast and efficient sampling of different climate scenarios to improve decision making.
+
+The paper, linked here, we will be working with is:
+>  **Watson-Parris, D.**, Rao, Y., Olivié, D., Seland, Ø., ... "ClimateBench v1.0: A benchmark for data-driven climate projections". *Journal of Advances in Modeling Earth Systems 14, e2021MS002954*: <https://doi.org/10.1029/2021MS002954>
+
+
+### Accessing the ClimateBench Dataset
+
+While the processed dataset is publically available, it will be instructive for you to generate it yourselves, and an important part of the replication process. You will be provided with access to [Casper](https://arc.ucar.edu/knowledge_base/70549550) data analysis cluster at the National Center for Atmospheric Research (NCAR) with sufficient resources to perform the analyses throughout the project. Please note this is a national facility with shared resources so be mindful of your requests and be sure to abide by their rules.
+
+The data we will use is available from the sixth Coupled Model Intercomparison Project (CMIP6) which represents the combined efforts of dozens of international research laboratories running hundreds of thousands of simulation years of experiments. The data (all 30 petabytes!) is publically archived and available e.g. here: https://esgf-index1.ceda.ac.uk/projects/esgf-ceda/, and also recently mirrored to the cloud here: https://registry.opendata.aws/cmip6/. Fortunately, all the data you will need is already available on Casper so you shouldn't need to download any large datasets, which can be quite cumbersome. 
+
+### Schedule
+
+Click the "topic" links below for details regarding the readings, questions, and tasks for that week.
+
+| Week | Topic |
+| --- | --- |
+| Summer | [Summer preperation](dsc_180_summer) |
+| 1 | [Introduction to topic, domain, and paper](dsc_180_intro) |
+| 2-3 | [Dive into the ClimateBench dataset](dsc_180_data) |
+| 4-5 | [Begin data preprocessing and learn about xarray](dsc_180_xarray) |
+| 6-7 | [Start implementing regression models](dsc_180_implement) |
+| 8-9 | [Perform validation and testing of baselines](dsc_180_validate) |
+| 10 | [Project wrap up and debrief](dsc_180_debrief) |
+
+
+
+## Phase II
+
+TBD. 
+
+Possible options for extension include:
+- Incorporating multiple climate models at different levels of fidelity
+- Improving the regression models or dimensionality reduction approaches, perhaps using graph-based regulariztion
+- Extending the variables that are emulated 
+
+## Section & Group Participation
+
+Participation in the weekly discussion section is mandatory. Each week, during phase 1, you are responsible for doing the reading/task assigned in the schedule and submitting answers to the listed questions before discussion section begins.
+
+Weekly assigned questions help me to observe how you are all doing on the project, as well as to focus your work for the week and help prepare you for discussion. If you have questions about your work, please ask them in section, on discord, or in office hours (I will rarely comment on your submission answers).
+
+
+## Office Hours
+
+Wednesdays 10-11am in HDSI room 325.
+
+You're also welcome to join our group meetings on Mondays at 1pm in Nierenberg Hall room 432 at SIO. 
diff --git a/_pages/dsc_180_data.md b/_pages/dsc_180_data.md
@@ -0,0 +1,30 @@
+---
+permalink: /dsc_180_data/
+title: "DSC 180ab B03 - Dive into the ClimateBench dataset"
+layout: single
+---
+
+Let's get hands-on with the data!
+
+### Topics
+
+You will be touching on the following topics.
+
+- Exploring the CMIP6 data and processing the NorESM2 output
+- Processing the input4MIPS emissions data used to drive the models
+
+
+### Tasks
+
+- Process the NorESM2 data for each of the required experiments. The data can be found in `/glade/collections/cmip/CMIP6/{activity}/NCC/NorESM2-LM/{experiment}`
+- Process the input data. The data can be found in `/glade/p/cesmdata/cseg/inputdata/atm/cam/chem/emis/`
+ - You will need to take the global averages of some quantities and regrid others
+
+### Questions
+
+Make sure your presentation is polished and answers the following questions
+
+1. Could your group successfully replicate the findings of the paper, even if only partially? Provide context on the factors that contributed to your success or failure in replication. For instance, did you find any disagreement with the methodology used in the paper?
+2. Please present the final visualizations created during the course of this project.
+3. Summarize the key concepts and topics that you gained insights into through this project.
+4. Engage in brainstorming to come up with potential ideas for further expanding this research, or propose innovative projects related to sepsis utilizing the MIMIC-III/IV dataset.
diff --git a/_pages/dsc_180_debrief.md b/_pages/dsc_180_debrief.md
@@ -0,0 +1,30 @@
+---
+permalink: /dsc_180_debrief/
+title: "DSC 180ab B03 - Project wrap up and debrief"
+layout: single
+---
+
+Wrap up week! Great job everyone.
+
+### Topics
+
+You will be touching on the following topics.
+
+- Making a final determination on replication
+- Presenting your findings
+
+
+### Tasks
+
+- Create a final replication presentation, this should be comprehensive and can be built offf of the presentation your group gave a few weeks back.
+- Brainstorm ideas for Phase II
+- Enjoy cupcakes during the final section meeting/presentation! On me.
+
+### Final Presentation
+
+Make sure your presentation is polished and answers the following questions
+
+1. Could your group successfully replicate the findings of the paper, even if only partially? Provide context on the factors that contributed to your success or failure in replication. For instance, did you find any disagreement with the methodology used in the paper?
+2. Please present the final visualizations created during the course of this project.
+3. Summarize the key concepts and topics that you gained insights into through this project.
+4. Engage in brainstorming to come up with potential ideas for further expanding this research, or propose innovative projects related to sepsis utilizing the MIMIC-III/IV dataset.
diff --git a/_pages/dsc_180_implement.md b/_pages/dsc_180_implement.md
@@ -0,0 +1,20 @@
+---
+permalink: /dsc_180_implement/
+title: "DSC 180ab B03 - Start implementing regression models"
+layout: single
+---
+
+
+### Topics
+
+You will be touching on the following topics:
+
+- 
+
+### Tasks
+
+- 
+
+### Questions
+
+1. 
diff --git a/_pages/dsc_180_intro.md b/_pages/dsc_180_intro.md
@@ -0,0 +1,44 @@
+---
+permalink: /dsc_180_intro/
+title: "DSC 180ab B03 - Introduction to topic, domain, and paper"
+layout: single
+toc: true
+toc_sticky: true
+---
+
+
+Welcome, Welcome, Welcome!
+
+Week 1 is upon us and we are going to spend it getting everyone situated, goiong over the primary paper, and beginning to read up on the domain of climate model simulation and emulation
+
+### Topics
+
+You will be touching on these topics over the first week
+
+- What is sepsis, how it can be diagnosed/treated and why it is very deadly from an epidemiological point of view
+- Severity of illness scores
+- EHR data
+- Reproducibility/replicability in data science
+
+### Tasks
+
+- Gain access to the MIMIC-III and MIMIC-IV dataset, see instructions on home pagecritical
+- Complete all assigned readings
+- Begin to familairize yourself with the domain by thumbing through the Domain Expertise page.
+- Submit question answers to the online form (linked below)
+- Join the class Discord Channel: TBD
+
+### Readings
+
+- Read the capstone primary paper by Zador et al. linked on the home pagecritical
+- MIMIC-III paper by Johnson et al. link - if you already did not finish it
+- MIMIC-III web description link - if you already did not finish it
+
+### Questions
+
+Answer the following questions using this google form link
+
+1. What are the primary goals of the research paper we aim to replicate? Additionally, what do you anticipate to be the major hurdles in this process? Lastly, define what outcomes would you consider as a successful completion of Phase I in this capstone project.
+2. Provide a detailed explanation of what an Electronic Health Record (EHR) is. How does EHR relate to and integrate with the MIMIC datasets?
+3. Explain the concept and utility of Illness Scores in healthcare. Why do multiple illness scores exist? Select one illness score from the following - OASIS, SAPS II, or SOFA. Write a brief overview, including its applications and key attributes. Note: The aim is for each team member to gain expertise in at least one illness score, so ensure that all three scores are explored by different members.
+
diff --git a/_pages/dsc_180_summer.md b/_pages/dsc_180_summer.md
@@ -0,0 +1,11 @@
+---
+permalink: /dsc_180_summer/
+title: "DSC 180ab B03 - Summer tasks"
+layout: single
+---
+
+I don't expect much over the summer but the following activities would be useful and allow you to hit the ground running in Fall:
+
+- Skim the latest UN Intergovernmental Panel on Climate Change Synthesis Report to get a summary of the latest climate change science, especially the figures: https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_SPM.pdf
+- Read the ClimateBench paper: https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954
+- Try out the xarray python library for working with climate data: https://docs.xarray.dev/en/stable/
diff --git a/_pages/dsc_180_validate.md b/_pages/dsc_180_validate.md
@@ -0,0 +1,20 @@
+---
+permalink: /dsc_180_validate/
+title: "DSC 180ab B03 - Perform validation and testing of baselines"
+layout: single
+---
+
+
+### Topics
+
+You will be touching on the following topics:
+
+- 
+
+### Tasks
+
+- 
+
+### Questions
+
+1. 
diff --git a/_pages/dsc_180_xarray.md b/_pages/dsc_180_xarray.md
@@ -0,0 +1,20 @@
+---
+permalink: /dsc_180_xarray/
+title: "DSC 180ab B03 - Begin data preprocessing and learn about xarray"
+layout: single
+---
+
+
+### Topics
+
+You will be touching on the following topics:
+
+- 
+
+### Tasks
+
+- 
+
+### Questions
+
+1.