Skip to content

Commit e16924c

Browse files
committed
start notebook
1 parent 9236209 commit e16924c

4 files changed

Lines changed: 221 additions & 33 deletions

File tree

dataretrieval/waterdata/__init__.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@
1111

1212
# Public API exports
1313
from .api import (
14-
_check_profiles,
1514
get_codes,
1615
get_continuous,
1716
get_daily,
@@ -41,7 +40,6 @@
4140
"get_reference_table",
4241
"get_samples",
4342
"get_time_series_metadata",
44-
"_check_profiles",
4543
"CODE_SERVICES",
4644
"SERVICES",
4745
"PROFILES",

dataretrieval/waterdata/api.py

Lines changed: 2 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -17,15 +17,15 @@
1717
from dataretrieval.waterdata.types import (
1818
CODE_SERVICES,
1919
METADATA_COLLECTIONS,
20-
PROFILE_LOOKUP,
2120
PROFILES,
2221
SERVICES,
2322
)
2423
from dataretrieval.waterdata.utils import (
2524
SAMPLES_URL,
2625
get_ogc_data,
2726
_construct_api_requests,
28-
_walk_pages
27+
_walk_pages,
28+
_check_profiles
2929
)
3030

3131
# Set up logger for this module
@@ -1703,31 +1703,3 @@ def get_samples(
17031703

17041704
return df, BaseMetadata(response)
17051705

1706-
1707-
def _check_profiles(
1708-
service: SERVICES,
1709-
profile: PROFILES,
1710-
) -> None:
1711-
"""Check whether a service profile is valid.
1712-
1713-
Parameters
1714-
----------
1715-
service : string
1716-
One of the service names from the "services" list.
1717-
profile : string
1718-
One of the profile names from "results_profiles",
1719-
"locations_profiles", "activities_profiles",
1720-
"projects_profiles" or "organizations_profiles".
1721-
"""
1722-
valid_services = get_args(SERVICES)
1723-
if service not in valid_services:
1724-
raise ValueError(
1725-
f"Invalid service: '{service}'. Valid options are: {valid_services}."
1726-
)
1727-
1728-
valid_profiles = PROFILE_LOOKUP[service]
1729-
if profile not in valid_profiles:
1730-
raise ValueError(
1731-
f"Invalid profile: '{profile}' for service '{service}'. "
1732-
f"Valid options are: {valid_profiles}."
1733-
)

dataretrieval/waterdata/utils.py

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
import os
55
import re
66
from datetime import datetime
7-
from typing import Any, Dict, List, Optional, Tuple, Union
7+
from typing import Any, Dict, List, Optional, Tuple, Union, get_args
88

99
import pandas as pd
1010
import requests
@@ -13,6 +13,12 @@
1313
from dataretrieval.utils import BaseMetadata
1414
from dataretrieval import __version__
1515

16+
from dataretrieval.waterdata.types import (
17+
PROFILE_LOOKUP,
18+
PROFILES,
19+
SERVICES,
20+
)
21+
1622
try:
1723
import geopandas as gpd
1824

@@ -824,3 +830,31 @@ def get_ogc_data(
824830
return return_list, metadata
825831

826832

833+
def _check_profiles(
834+
service: SERVICES,
835+
profile: PROFILES,
836+
) -> None:
837+
"""Check whether a service profile is valid.
838+
839+
Parameters
840+
----------
841+
service : string
842+
One of the service names from the "services" list.
843+
profile : string
844+
One of the profile names from "results_profiles",
845+
"locations_profiles", "activities_profiles",
846+
"projects_profiles" or "organizations_profiles".
847+
"""
848+
valid_services = get_args(SERVICES)
849+
if service not in valid_services:
850+
raise ValueError(
851+
f"Invalid service: '{service}'. Valid options are: {valid_services}."
852+
)
853+
854+
valid_profiles = PROFILE_LOOKUP[service]
855+
if profile not in valid_profiles:
856+
raise ValueError(
857+
f"Invalid profile: '{profile}' for service '{service}'. "
858+
f"Valid options are: {valid_profiles}."
859+
)
860+

demos/WaterData_demo.ipynb

Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "7d0ca866",
6+
"metadata": {},
7+
"source": [
8+
"# Using the `waterdata` module to pull data from the USGS Water Data APIs\n",
9+
"The `waterdata` module will eventually replace the `nwis` module for accessing USGS water data. It leverages the [Water Data APIs](https://api.waterdata.usgs.gov/) to download metadata, daily values, and instantaneous values. \n",
10+
"\n",
11+
"While the specifics of this transition timeline are hazy, it is advised to switch to the new functions as soon as possible to reduce unexpected interruptions in your workflow.\n",
12+
"\n",
13+
"As always, please report any issues you encounter on our [Issues](https://github.com/DOI-USGS/dataretrieval-python/issues) page. If you have questions or need help, please reach out to us at comptools@usgs.gov."
14+
]
15+
},
16+
{
17+
"cell_type": "markdown",
18+
"id": "fcccb6e8",
19+
"metadata": {},
20+
"source": [
21+
"## Prerequisite: Get your Water Data API key\n",
22+
"We highly suggest signing up for your own API key [here](https://api.waterdata.usgs.gov/signup/) to afford yourself higher rate limits and more reliable access to the data. If you opt not to register for an API key, then the number of requests you can make to the Water Data APIs is considerably lower, and if you share an IP address across users or workflows, you may hit those limits even faster. Luckily, registering for an API key is free and easy.\n",
23+
"\n",
24+
"Once you've copied your API key and saved it in a safe place, you can set it as an environment variable in your python script for the current session:\n",
25+
"\n",
26+
"```python\n",
27+
"import os\n",
28+
"os.environ['API_USGS_PAT'] = 'your_api_key_here'\n",
29+
"``` \n",
30+
"Note that the environment variable name is `API_USGS_PAT`, which stands for \"API USGS Personal Access Token\".\n",
31+
"\n",
32+
"If you'd like a more permanent repository-specific solution, you can use the `python-dotenv` package to read your API key from a `.env` file in your repository root directory, like this:\n",
33+
"\n",
34+
"```python\n",
35+
"!pip install python-dotenv # only run this line once to install the package in your environment\n",
36+
"from dotenv import load_dotenv\n",
37+
"load_dotenv() # this will load the environment variables from the .env file\n",
38+
"```\n",
39+
"Make sure your `.env` file contains the following line:\n",
40+
"```\n",
41+
"API_USGS_PAT=your_api_key_here\n",
42+
"```\n",
43+
"Also, do not commit your `.env` file to version control, as it contains sensitive information. You can add it to your `.gitignore` file to prevent accidental commits."
44+
]
45+
},
46+
{
47+
"cell_type": "markdown",
48+
"id": "4a2b3f0f",
49+
"metadata": {},
50+
"source": [
51+
"## Lay of the Land\n",
52+
"Now that your API key is configured, it's time to take a 10,000-ft view of the functions in the `waterdata` module.\n",
53+
"\n",
54+
"### Metadata endpoints\n",
55+
"These functions retrieve metadata tables that can be used to refine your data requests.\n",
56+
"\n",
57+
"- `get_reference_table()` - Not sure which parameter code you're looking for, or which hydrologic unit your study area is in? This function will help you find the right input values for the data endpoints to retrieve the information you want.\n",
58+
"- `get_codes()` - Similar to `get_reference_table()`, this function retrieves dataframes containing available input values that correspond to the Samples database.\n",
59+
"\n",
60+
"### Data endpoints\n",
61+
"- `get_daily()` - Daily values for monitoring locations, parameters, stat codes, and more.\n",
62+
"- `get_continuous()` - Instantaneous values for monitoring locations, parameters, statistical codes, and more.\n",
63+
"- `get_monitoring_locations()`- Monitoring location information such as name, monitoring location ID, latitude, longitude, huc code, site types, and more.\n",
64+
"- `get_time_series_metadata()` - Timeseries metadata across monitoring locations, parameter codes, statistical codes, and more. Can be used to answer the question: what types of data are collected at my site(s) of interest and over what time period are/were they collected? \n",
65+
"- `get_latest_continuous()` - Latest instantaneous values for requested monitoring locations, parameter codes, statistical codes, and more.\n",
66+
"- `get_latest_daily()` - Latest daily values for requested monitoring locations, parameter codes, statistical codes, and more.\n",
67+
"- `get_field_measurements()` - Physically measured values (a.k.a discrete) of gage height, discharge, groundwater levels, and more for requested monitoring locations.\n",
68+
"- `get_samples()` - Discrete water quality sample results for monitoring locations, observed properties, and more."
69+
]
70+
},
71+
{
72+
"cell_type": "markdown",
73+
"id": "68591b52",
74+
"metadata": {},
75+
"source": [
76+
"## Examples\n",
77+
"Let's get into some examples using the functions listed above. First, we need to load the `waterdata` module."
78+
]
79+
},
80+
{
81+
"cell_type": "code",
82+
"execution_count": null,
83+
"id": "4ca9bb6a",
84+
"metadata": {},
85+
"outputs": [],
86+
"source": [
87+
"from IPython.display import display\n",
88+
"from dataretrieval import waterdata"
89+
]
90+
},
91+
{
92+
"cell_type": "code",
93+
"execution_count": null,
94+
"id": "1035ebbb",
95+
"metadata": {},
96+
"outputs": [],
97+
"source": [
98+
"pcodes,metadata = waterdata.get_reference_table(\"parameter-codes\")\n",
99+
"display(pcodes.head())"
100+
]
101+
},
102+
{
103+
"cell_type": "markdown",
104+
"id": "176c665b",
105+
"metadata": {},
106+
"source": [
107+
"What is this `metadata` element? Let's take a look:"
108+
]
109+
},
110+
{
111+
"cell_type": "code",
112+
"execution_count": null,
113+
"id": "30b1b052",
114+
"metadata": {},
115+
"outputs": [],
116+
"source": [
117+
"metadata"
118+
]
119+
},
120+
{
121+
"cell_type": "markdown",
122+
"id": "1e0eab77",
123+
"metadata": {},
124+
"source": [
125+
"All of these functions return Tuples containing a dataframe and a metadata element containing descriptors about the request made. This `BaseMetadata` object contains the request URL.\n",
126+
"\n",
127+
"Let's say we want to find all parameter codes relating to streamflow discharge. We can use some string matching to find applicable codes."
128+
]
129+
},
130+
{
131+
"cell_type": "code",
132+
"execution_count": null,
133+
"id": "665ccb23",
134+
"metadata": {},
135+
"outputs": [],
136+
"source": [
137+
"streamflow_pcodes = pcodes[pcodes['parameter_name'].str.contains('streamflow|discharge', case=False, na=False)]\n",
138+
"display(streamflow_pcodes[['parameter_code_id', 'parameter_name']])"
139+
]
140+
},
141+
{
142+
"cell_type": "markdown",
143+
"id": "d9487ee4",
144+
"metadata": {},
145+
"source": [
146+
"Interesting that there are so many different streamflow-related parameter codes! Going on experience, let's use the most common one, `00060`, which is \"Discharge, cubic feet per second\".\n",
147+
"\n",
148+
"Now that we know which parameter code we want to use, let's find all the stream monitoring locations that have recent discharge data and at least 10 years of daily values in the state of Nebraska. "
149+
]
150+
},
151+
{
152+
"cell_type": "code",
153+
"execution_count": null,
154+
"id": "ce4df5fb",
155+
"metadata": {},
156+
"outputs": [],
157+
"source": [
158+
"NE_locations,_ = waterdata.get_monitoring_locations(state_name=\"Nebraska\", site_type_code=\"ST\")\n",
159+
"display(NE_locations.head())"
160+
]
161+
}
162+
],
163+
"metadata": {
164+
"kernelspec": {
165+
"display_name": "dr-test",
166+
"language": "python",
167+
"name": "python3"
168+
},
169+
"language_info": {
170+
"codemirror_mode": {
171+
"name": "ipython",
172+
"version": 3
173+
},
174+
"file_extension": ".py",
175+
"mimetype": "text/x-python",
176+
"name": "python",
177+
"nbconvert_exporter": "python",
178+
"pygments_lexer": "ipython3",
179+
"version": "3.14.0"
180+
}
181+
},
182+
"nbformat": 4,
183+
"nbformat_minor": 5
184+
}

0 commit comments

Comments
 (0)