This example walks through a complete MLOps workflow using a synthetic housing price dataset. By the end you will have a model trained and registered in MLflow, served via FastAPI, and monitored in Grafana with BigQuery as the data backend.
- Deploy the infrastructure following the GCP Cloud Run tutorial
- Run
deployml get-urls --config-path config.yamlto generate your.envfile - Install Python dependencies:
pip install mlflow scikit-learn pandas numpy google-cloud-bigquery db-dtypes python-dotenv requestsAll scripts read from the .env file written by deployml get-urls. It should contain:
MLFLOW_URL=https://...
FASTAPI_URL=https://...
GRAFANA_URL=https://...
BIGQUERY_PROJECT=your-project-id
BIGQUERY_DATASET=mlops
Run in order from the project root (where .env lives):
python example/scripts/01_load_training_data.pyGenerates 500 rows of synthetic housing data and loads them into the offline_features BigQuery table.
Verify:
bq query --use_legacy_sql=false 'SELECT COUNT(*) FROM `YOUR_PROJECT.mlops.offline_features`'python example/scripts/02_train_model.pyPulls features from BigQuery, trains a RandomForestRegressor, and logs parameters, metrics, and the model artifact to MLflow.
Verify: open MLFLOW_URL in your browser — you should see the housing-price-prediction experiment with a completed run.
python example/scripts/03_register_model.pyFinds the best run by RMSE, registers it as HousingPriceModel, and promotes it to the Production stage.
Verify: in the MLflow UI, click Models → HousingPriceModel → Production stage should be set.
python example/scripts/04_make_predictions.pyPulls 50 rows from offline_features and sends each to FastAPI /predict. FastAPI loads the model from MLflow on startup and automatically logs each prediction to the predictions BigQuery table.
Verify:
bq query --use_legacy_sql=false 'SELECT COUNT(*) FROM `YOUR_PROJECT.mlops.predictions`'Also check FastAPI is serving the model:
curl FASTAPI_URL/health
# model_loaded should be truepython example/scripts/05_generate_ground_truth.pyFor each prediction, generates a fake actual value (predicted value + noise) and writes it to the ground_truth table. In a real scenario this would be actual outcomes matched back by entity_id.
python example/scripts/06_compute_drift_metrics.pyComputes feature mean shift (training distribution vs recent data) and MAE (predictions vs ground truth). Writes results to the drift_metrics table.
python example/scripts/07_setup_grafana.pyProvisions a monitoring dashboard in Grafana via the API showing:
- Prediction volume over time
- Mean predicted price over time
- Feature mean shift per feature
- MAE over time
Open GRAFANA_URL in your browser (login: admin / admin) to view the dashboard.
Synthetic housing data with features:
| Feature | Description |
|---|---|
bedrooms |
Number of bedrooms (1–5) |
bathrooms |
Number of bathrooms (1–3) |
area_sqft |
Living area in square feet (800–4000) |
lot_size |
Lot size in square feet (2000–10000) |
year_built |
Year the house was built (1960–2022) |
city |
City encoded as integer (0–4) |
state |
State encoded as integer (0–2) |
Target: price = area_sqft * 200 + bedrooms * 15000 + bathrooms * 10000 + (2023 - year_built) * -500 + noise