[WIP] Fix minimum-permissions and custom-tag documentation gaps#37965
Draft
kevinzenghu wants to merge 1 commit into
Draft
[WIP] Fix minimum-permissions and custom-tag documentation gaps#37965kevinzenghu wants to merge 1 commit into
kevinzenghu wants to merge 1 commit into
Conversation
Sourced from mining ~5 months of Slack support/sales archives for recurring patterns: "minimum permissions" confusion (17 distinct threads) and "custom tags not propagating" (recurring since Nov 2025, same unresolved answer every time). Each fix below is grounded in crawler code, not just the reported symptom. ## Permissions - glue.md: IAM policy was missing `glue:GetTags` — confirmed via libs/glue/client.go (the crawler calls GetTags) and the crawler's dedicated warning path (glue_crawler.go) for exactly this access-denied case. Jobs still sync without it, just untagged, which makes the gap easy to miss — added the policy action plus a note explaining the symptom. - jobs_monitoring/databricks/_index.md: added a Prerequisites section stating Jobs Monitoring requires Workspace Admin for the recommended install path (unlike Quality Monitoring, which doesn't) — this was previously only mentioned deep in an "Advanced Configuration > Permissions" subsection, never up front or contrasted with QM. - dbt.md: added a Troubleshooting section for silent webhook/connection failures caused by a token's permission scope being lowered after setup — grounded in DbtCloudHealthWorker's authorization-error paths and VerifyAccess's 403 check (shared/libs/dbtcloud/client.go), which already test for exactly this but weren't surfaced in docs. ## Custom tags - airflow.md: added a "Custom tags" section documenting that DAG-level `tags=[...]` auto-propagate to Jobs Monitoring with zero configuration — confirmed via lineage-processor's GetAirflowTags() and the OpenLineage DAG facet parsing, but was completely undocumented despite being true. - kubernetes.md: added `-Ddd.tags` JVM-option documentation, matching the DD_TAGS pattern already documented for EMR and Dataproc but missing here. ## Explicitly not changed (flagging instead of guessing) - Databricks Quality/Jobs Monitoring custom-tag auto-capture: the docs already got fixed for this on 2026-06-01 (commit bcef067) — native cluster tags now auto-propagate except Azure resource-group tags. The Slack "no auto-propagation" answer reps keep giving is now STALE, not the docs — this is a team-communication gap, not something for this PR to fix. Worth telling the support/sales team the answer changed. - BigQuery external tables backed by Google Sheets/Drive needing extra credentials/scope: confirmed via code that this isn't handled by existing error-handling paths (unlike the two other BigQuery external-table failure modes, which are) and isn't documented — but I couldn't confirm the exact required role/scope from code alone. Needs input from the BigQuery crawler owner before writing content; not guessing at Drive API scope names. AI assistance: found and fixed by Claude Code via a targeted research pass grounded in dd-source crawler/processor code for each specific Slack-reported gap — flagging per the AI-assistance disclosure in CONTRIBUTING.md.
Contributor
Contributor
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Sourced from mining ~5 months of Slack support/sales archives for recurring patterns (see linked analysis): "minimum permissions" confusion (17 threads) and "custom tags not propagating" (recurring since Nov 2025). Each fix is grounded in crawler/processor code, not just the reported symptom.
Fixes
glue:GetTags— confirmed vialibs/glue/client.go(crawler calls GetTags) and a dedicated crawler warning path for exactly this access-denied case.DbtCloudHealthWorker's existing authorization-error checks.GetAirflowTags()), previously undocumented.-Ddd.tags— added, matching the pattern already documented for EMR/Dataproc.Found but explicitly not fixed here
bcef0674fd) — reps in Slack are giving a now-stale "no auto-propagation" answer. This is a team-communication gap, not a doc gap — flagging for whoever owns support enablement, not fixing here.Status
Work in progress — not ready for review yet. Draft, no reviewers requested.
AI assistance
Found and fixed by Claude Code via a targeted research pass grounded in dd-source code for each specific gap — flagging per the AI-assistance disclosure in CONTRIBUTING.md.