Skip to content

Commit 31d457b

Browse files
authored
refactor/superset: update Superset to 6.0.0 (#168)
* Bumps Superset docker image from `4.0.2` → `6.0.0` * Updates bootstrap/init scripts and Python config accordingly ([official 6.0.0 docker setup](https://github.com/apache/superset/tree/6.0.0/docker)) * Adds `uv pip` support with pip fallback, DEV_MODE editable install, configurable log level, expanded Celery task imports, and thumbnail cache config
1 parent 95cc282 commit 31d457b

8 files changed

Lines changed: 139 additions & 45 deletions

File tree

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Data visualization with Superset
2+
3+
[![Superset](https://img.shields.io/badge/Superset-262A38?style=flat&logo=apachesuperset&logoColor=1EA7C9&labelColor=262A38)](https://superset.apache.org/)
4+
[![BigQuery](https://img.shields.io/badge/BigQuery-262A38?style=flat&logo=googlebigquery&logoColor=white&labelColor=3772FF)](https://console.cloud.google.com/bigquery)
5+
[![Snowflake](https://img.shields.io/badge/Snowflake-262A38?style=flat&logo=snowflake&logoColor=white&labelColor=249EDC)](https://www.snowflake.com/en/product/platform/)
6+
[![PostgreSQL](https://img.shields.io/badge/PostgreSQL-262A38?style=flat&logo=postgresql&logoColor=white&labelColor=336791)](https://hub.docker.com/_/postgres)
7+
[![ClickHouse](https://img.shields.io/badge/ClickHouse-262A38?style=flat&logo=clickhouse&logoColor=FFFFFF&labelColor=262A38)](https://clickhouse.com/docs/en/install)
8+
[![DuckDB](https://img.shields.io/badge/DuckDB-262A38?style=flat&logo=duckdb&logoColor=FEF000&labelColor=262A38)](https://duckdb.org/docs/)
9+
10+
![License](https://img.shields.io/badge/license-CC--BY--SA--4.0-31393F?style=flat&logo=creativecommons&logoColor=black&labelColor=white)
11+
12+
13+
## Getting Started
14+
15+
**1.** Spin up Apache Superset infrastructure with:
16+
```shell
17+
docker compose up -d
18+
```
19+
20+
**2.** After the `superset-app` container is healthy, access Superset at [http://localhost:8088](http://localhost:8088/)
21+
22+
```text
23+
Username: admin
24+
Password: admin
25+
```
26+
27+
### Pre-loaded examples
28+
29+
If you'd like Superset to come with example charts and dashboards, set this **before the first run**:
30+
```shell
31+
export SUPERSET_LOAD_EXAMPLES=yes
32+
```
33+
34+
Once `superset-init` finishes, make sure to disable it so the examples aren't re-loaded on every restart:
35+
```shell
36+
unset SUPERSET_LOAD_EXAMPLES
37+
```
38+
39+
### Additional database drivers
40+
41+
Superset supports PostgreSQL and MySQL out-of-the-box. To enable additional data sources, add the respective `SQLAlchemy` driver to [requirements-local.txt](./conf/requirements-local.txt). See the full list of [supported databases](https://superset.apache.org/docs/databases/)
42+
```text
43+
clickhouse-connect==0.8.15
44+
sqlalchemy-bigquery==1.12.1
45+
sqlalchemy-redshift==0.8.14
46+
```
47+
48+
49+
## TODO's
50+
- [x] Bootstrap Apache Superset infrastructure in Docker
51+
- [x] Build data viz for NYC Taxi Dataset on Superset

module4-analytics-engineering/visualization/compose.superset.yaml renamed to module4-analytics-engineering/visualization/superset/compose.yaml

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
x-superset-image: &superset-image apache/superset:${SUPERSET_VERSION:-4.0.2}
2-
x-postgres-image: &postgres-image postgres:${POSTGRES_VERSION:-17-alpine}
3-
x-redis-image: &redis-image redis:${REDIS_VERSION:-7-alpine}
1+
x-superset-image: &superset-image apache/superset:${SUPERSET_VERSION:-6.0.0}
2+
x-postgres-image: &postgres-image postgres:${POSTGRES_VERSION:-18.1-alpine}
3+
x-redis-image: &redis-image redis:${REDIS_VERSION:-8.6-alpine}
44

55
x-superset-common:
66
&superset-common
@@ -21,21 +21,22 @@ x-superset-common:
2121
# Superset specific env vars
2222
SUPERSET_ENV: 'production'
2323
SUPERSET_PORT: 8088
24-
SUPERSET_LOAD_EXAMPLES: ${SUPERSET_LOAD_EXAMPLES:-no}
2524
SUPERSET_SECRET_KEY: 'TEST_NON_DEV_SECRET'
2625
CYPRESS_CONFIG: 'false'
26+
SUPERSET_LOG_LEVEL: 'info'
2727
MAPBOX_API_KEY: ''
2828
# Add the mapped in /app/pythonpath_docker which allows devs to override stuff
2929
PYTHONPATH: '/app/pythonpath:/app/docker/pythonpath_dev'
3030
# Examples DB
31+
SUPERSET_LOAD_EXAMPLES: ${SUPERSET_LOAD_EXAMPLES:-no}
3132
EXAMPLES_HOST: 'superset-db'
3233
EXAMPLES_PORT: 5432
3334
EXAMPLES_DB: 'examples'
3435
EXAMPLES_USER: 'examples'
3536
EXAMPLES_PASSWORD: 'examples'
3637
volumes:
3738
&superset-common-volumes
38-
- ./superset:/app/docker
39+
- ./conf:/app/docker
3940
- vol-superset-home:/app/superset_home
4041
depends_on:
4142
&superset-common-depends-on
@@ -59,7 +60,7 @@ services:
5960
- '5432'
6061
volumes:
6162
- vol-superset-db:/var/lib/postgresql/data
62-
- ./superset/examples-init.sh:/docker-entrypoint-initdb.d/examples-init.sh
63+
- ./conf/examples-init.sh:/docker-entrypoint-initdb.d/examples-init.sh
6364
healthcheck:
6465
test: ["CMD-SHELL", "pg_isready -U superset"]
6566
interval: 5s

module4-analytics-engineering/visualization/superset/docker-bootstrap.sh renamed to module4-analytics-engineering/visualization/superset/conf/docker-bootstrap.sh

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -18,35 +18,52 @@
1818

1919
set -eo pipefail
2020

21+
# Make python interactive
22+
if [ "$DEV_MODE" == "true" ]; then
23+
if [ "$(whoami)" = "root" ] && command -v uv > /dev/null 2>&1; then
24+
echo "Reinstalling the app in editable mode"
25+
uv pip install -e .
26+
fi
27+
fi
2128
REQUIREMENTS_LOCAL="/app/docker/requirements-local.txt"
29+
PORT=${PORT:-8088}
2230
# If Cypress run – overwrite the password for admin and export env variables
2331
if [ "$CYPRESS_CONFIG" == "true" ]; then
24-
export SUPERSET_CONFIG=tests.integration_tests.superset_test_config
2532
export SUPERSET_TESTENV=true
26-
export SUPERSET__SQLALCHEMY_DATABASE_URI=postgresql+psycopg2://superset:superset@db:5432/superset
33+
export POSTGRES_DB=superset_cypress
34+
export SUPERSET__SQLALCHEMY_DATABASE_URI=postgresql+psycopg2://superset:superset@superset-db:5432/superset_cypress
35+
PORT=8081
36+
fi
37+
if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ]; then
38+
# older images may not have the postgres dev requirements installed
39+
echo "Installing postgres requirements"
40+
if command -v uv > /dev/null 2>&1; then
41+
# Use uv in newer images
42+
uv pip install -e .[postgres]
43+
else
44+
# Use pip in older images
45+
pip install -e .[postgres]
46+
fi
2747
fi
2848
#
2949
# Make sure we have dev requirements installed
3050
#
3151
if [ -f "${REQUIREMENTS_LOCAL}" ]; then
3252
echo "Installing local overrides at ${REQUIREMENTS_LOCAL}"
33-
pip install --no-cache-dir -r "${REQUIREMENTS_LOCAL}"
53+
if command -v uv > /dev/null 2>&1; then
54+
uv pip install --no-cache-dir -r "${REQUIREMENTS_LOCAL}"
55+
else
56+
pip install --no-cache-dir -r "${REQUIREMENTS_LOCAL}"
57+
fi
3458
else
3559
echo "Skipping local overrides"
3660
fi
3761

38-
#
39-
# playwright is an optional package - run only if it is installed
40-
#
41-
if command -v playwright > /dev/null 2>&1; then
42-
playwright install-deps
43-
playwright install chromium
44-
fi
45-
4662
case "${1}" in
4763
worker)
4864
echo "Starting Celery worker..."
49-
celery --app=superset.tasks.celery_app:app worker -O fair -l INFO
65+
# setting up only 2 workers by default to contain memory usage in dev environments
66+
celery --app=superset.tasks.celery_app:app worker -O fair -l INFO --concurrency=${CELERYD_CONCURRENCY:-2}
5067
;;
5168
beat)
5269
echo "Starting Celery beat..."
@@ -55,7 +72,7 @@ case "${1}" in
5572
;;
5673
app)
5774
echo "Starting web app (using development server)..."
58-
flask run -p 8088 --with-threads --reload --debugger --host=0.0.0.0
75+
flask run -p $PORT --with-threads --reload --debugger --host=0.0.0.0
5976
;;
6077
app-gunicorn)
6178
echo "Starting web app..."

module4-analytics-engineering/visualization/superset/docker-init.sh renamed to module4-analytics-engineering/visualization/superset/conf/docker-init.sh

Lines changed: 19 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -22,28 +22,26 @@ set -e
2222
#
2323
/app/docker/docker-bootstrap.sh
2424

25-
STEP_CNT=4
25+
if [ "$SUPERSET_LOAD_EXAMPLES" = "yes" ]; then
26+
STEP_CNT=4
27+
else
28+
STEP_CNT=3
29+
fi
2630

2731
echo_step() {
2832
cat <<EOF
29-
3033
######################################################################
31-
32-
3334
Init Step ${1}/${STEP_CNT} [${2}] -- ${3}
34-
35-
3635
######################################################################
37-
3836
EOF
3937
}
40-
ADMIN_PASSWORD="admin"
38+
ADMIN_PASSWORD="${ADMIN_PASSWORD:-admin}"
4139
# If Cypress run – overwrite the password for admin and export env variables
4240
if [ "$CYPRESS_CONFIG" == "true" ]; then
4341
ADMIN_PASSWORD="general"
44-
export SUPERSET_CONFIG=tests.integration_tests.superset_test_config
4542
export SUPERSET_TESTENV=true
46-
export SUPERSET__SQLALCHEMY_DATABASE_URI=postgresql+psycopg2://superset:superset@db:5432/superset
43+
export POSTGRES_DB=superset_cypress
44+
export SUPERSET__SQLALCHEMY_DATABASE_URI=postgresql+psycopg2://superset:superset@superset-db:5432/superset_cypress
4745
fi
4846
# Initialize the database
4947
echo_step "1" "Starting" "Applying DB migrations"
@@ -52,12 +50,16 @@ echo_step "1" "Complete" "Applying DB migrations"
5250

5351
# Create an admin user
5452
echo_step "2" "Starting" "Setting up admin user ( admin / $ADMIN_PASSWORD )"
55-
superset fab create-admin \
56-
--username admin \
57-
--firstname Superset \
58-
--lastname Admin \
59-
--email admin@superset.com \
60-
--password $ADMIN_PASSWORD
53+
if [ "$CYPRESS_CONFIG" == "true" ]; then
54+
superset load_test_users
55+
else
56+
superset fab create-admin \
57+
--username admin \
58+
--email admin@superset.com \
59+
--password "$ADMIN_PASSWORD" \
60+
--firstname Superset \
61+
--lastname Admin
62+
fi
6163
echo_step "2" "Complete" "Setting up admin user"
6264
# Create default roles and permissions
6365
echo_step "3" "Starting" "Setting up roles and perms"
@@ -69,10 +71,9 @@ if [ "$SUPERSET_LOAD_EXAMPLES" = "yes" ]; then
6971
echo_step "4" "Starting" "Loading examples"
7072
# If Cypress run which consumes superset_test_config – load required data for tests
7173
if [ "$CYPRESS_CONFIG" == "true" ]; then
72-
superset load_test_users
7374
superset load_examples --load-test-data
7475
else
75-
superset load_examples --force
76+
superset load_examples
7677
fi
7778
echo_step "4" "Complete" "Loading examples"
7879
fi

module4-analytics-engineering/visualization/superset/examples-init.sh renamed to module4-analytics-engineering/visualization/superset/conf/examples-init.sh

File renamed without changes.

module4-analytics-engineering/visualization/superset/pythonpath_dev/superset_config.py renamed to module4-analytics-engineering/visualization/superset/conf/pythonpath_dev/superset_config.py

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
#
2323
import logging
2424
import os
25+
import sys
2526

2627
from celery.schedules import crontab
2728
from flask_caching.backends.filesystemcache import FileSystemCache
@@ -70,11 +71,17 @@
7071
"CACHE_REDIS_DB": REDIS_RESULTS_DB,
7172
}
7273
DATA_CACHE_CONFIG = CACHE_CONFIG
74+
THUMBNAIL_CACHE_CONFIG = CACHE_CONFIG
7375

7476

7577
class CeleryConfig:
7678
broker_url = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
77-
imports = ("superset.sql_lab",)
79+
imports = (
80+
"superset.sql_lab",
81+
"superset.tasks.scheduler",
82+
"superset.tasks.thumbnails",
83+
"superset.tasks.cache",
84+
)
7885
result_backend = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
7986
worker_prefetch_multiplier = 1
8087
task_acks_late = False
@@ -94,22 +101,39 @@ class CeleryConfig:
94101

95102
FEATURE_FLAGS = {"ALERT_REPORTS": True}
96103
ALERT_REPORTS_NOTIFICATION_DRY_RUN = True
97-
WEBDRIVER_BASEURL = "http://superset:8088/"
104+
WEBDRIVER_BASEURL = f"http://superset:8088{os.environ.get('SUPERSET_APP_ROOT', '/')}/" # noqa: E501
98105
# The base URL for the email report hyperlinks.
99-
WEBDRIVER_BASEURL_USER_FRIENDLY = WEBDRIVER_BASEURL
106+
WEBDRIVER_BASEURL_USER_FRIENDLY = (
107+
f"http://localhost:8088{os.environ.get('SUPERSET_APP_ROOT', '/')}/"
108+
)
100109

101110
SQLLAB_CTAS_NO_LIMIT = True
102111

112+
log_level_text = os.getenv("SUPERSET_LOG_LEVEL", "INFO")
113+
LOG_LEVEL = getattr(logging, log_level_text.upper(), logging.INFO)
114+
115+
if os.getenv("CYPRESS_CONFIG") == "true":
116+
# When running the service as a cypress backend, we need to import the config
117+
# located @ tests/integration_tests/superset_test_config.py
118+
base_dir = os.path.dirname(__file__)
119+
module_folder = os.path.abspath(
120+
os.path.join(base_dir, "../../tests/integration_tests/")
121+
)
122+
sys.path.insert(0, module_folder)
123+
from superset_test_config import * # noqa
124+
125+
sys.path.pop(0)
126+
103127
#
104128
# Optionally import superset_config_docker.py (which will have been included on
105129
# the PYTHONPATH) in order to allow for local settings to be overridden
106130
#
107131
try:
108132
import superset_config_docker
109-
from superset_config_docker import * # noqa
133+
from superset_config_docker import * # noqa: F403
110134

111135
logger.info(
112-
f"Loaded your Docker configuration at " f"[{superset_config_docker.__file__}]"
136+
f"Loaded your Docker configuration at [{superset_config_docker.__file__}]"
113137
)
114138
except ImportError:
115139
logger.info("Using default Docker config...")
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
clickhouse-connect==0.11.0
2+
sqlalchemy-bigquery==1.16.0
3+
snowflake-sqlalchemy==1.8.2

module4-analytics-engineering/visualization/superset/requirements-local.txt

Lines changed: 0 additions & 3 deletions
This file was deleted.

0 commit comments

Comments
 (0)