Skip to content

Commit 61e9ecd

Browse files
authored
Add automated duplicate for new issue detection and auto-close workflows (#5276)
* Add automated duplicate issue detection and auto-close workflows Implements a 3-workflow system using claude-code-action with Bedrock OIDC: - claude-dedupe-issues.yml: detects duplicates on new issues via Claude - auto-close-duplicates.yml: daily cron closes flagged issues after 3 days - remove-autoclose-on-activity.yml: removes autoclose label on human comment Signed-off-by: Heng Qian <qianheng@amazon.com> * Use duplicate label for detection, autoclose label after closing - Detected duplicates now get `duplicate` label instead of `autoclose` - Auto-close workflow looks for `duplicate` label - After closing, adds `autoclose` label - Human comment removes `duplicate` label to prevent auto-closure - Fix state_reason to `duplicate` - Change grace period to 1 hour for testing Signed-off-by: Heng Qian <qianheng@amazon.com> * Add backfill workflow and improve duplicate comment message - Add backfill-duplicate-comments.yml to scan historical issues for duplicates - Add thumbs-down instruction to duplicate detection comment Signed-off-by: Heng Qian <qianheng@amazon.com> * Rename remove-autoclose-on-activity to remove-duplicate-on-activity Signed-off-by: Heng Qian <qianheng@amazon.com> * Fix security issues found by Code-Diff-Analyzer - Remove unnecessary allowed_non_write_users from dedupe workflow - Pass workflow inputs via env vars to prevent JS injection in backfill - Use bash array for REPO_FLAG to prevent word splitting in shell script Signed-off-by: Heng Qian <qianheng@amazon.com> * Allow github-actions bot to trigger dedupe workflow Backfill workflow dispatches dedupe via API as github-actions[bot], which requires explicit allowlisting in claude-code-action. Signed-off-by: Heng Qian <qianheng@amazon.com> * Use BEDROCK_ACCESS_ROLE and us-east-1 to match existing pr_review workflow Signed-off-by: Heng Qian <qianheng@amazon.com> * Make duplicate issue grace period configurable, default to 7 days Read from repo variable DUPLICATE_GRACE_DAYS (default 7) instead of hardcoded 3 days for both auto-close workflow and comment script. Signed-off-by: Heng Qian <qianheng@amazon.com> * Filter out newer and already-duplicate issues in dedupe search Only consider issues with lower issue numbers as potential originals, and exclude issues already labeled duplicate from search results. Signed-off-by: Heng Qian <qianheng@amazon.com> * Remove backfill workflow to reduce risk for initial rollout Only run duplicate detection on newly created issues for now. Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com>
1 parent cb539e2 commit 61e9ecd

5 files changed

Lines changed: 340 additions & 0 deletions

File tree

.claude/commands/dedupe.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
allowed-tools: Bash(gh:*), Bash(./scripts/comment-on-duplicates.sh:*)
3+
description: Find duplicate GitHub issues
4+
---
5+
6+
Find up to 3 likely duplicate issues for a given GitHub issue.
7+
8+
Follow these steps precisely:
9+
10+
1. Use `gh issue view <number>` to read the issue. If the issue is closed, or is broad product feedback without a specific bug/feature request, or already has a duplicate detection comment (containing `<!-- duplicate-detection -->`), stop and report why you are not proceeding.
11+
12+
2. Summarize the issue's core problem in 2-3 sentences. Identify the key terms, error messages, and affected components.
13+
14+
3. Search for potential duplicates using **at least 3 different search strategies**. Run these searches in parallel. **Only consider issues with a lower issue number** (older issues) as potential originals — skip any result with a number >= the current issue. Also skip issues already labeled `duplicate`.
15+
- `gh search issues "<exact error message or key phrase>" --repo $GITHUB_REPOSITORY --state open -- -label:duplicate --limit 15 --json number,title | jq '[.[] | select(.number < <current-issue-number>)]'`
16+
- `gh search issues "<component or feature keywords>" --repo $GITHUB_REPOSITORY --state open -- -label:duplicate --limit 15 --json number,title | jq '[.[] | select(.number < <current-issue-number>)]'`
17+
- `gh search issues "<alternate description of the problem>" --repo $GITHUB_REPOSITORY --state open -- -label:duplicate --limit 15 --json number,title | jq '[.[] | select(.number < <current-issue-number>)]'`
18+
- `gh search issues "<key terms>" --repo $GITHUB_REPOSITORY --state all -- -label:duplicate --limit 10 --json number,title | jq '[.[] | select(.number < <current-issue-number>)]'` (include closed issues for reference)
19+
20+
4. For each candidate issue that looks like a potential match, read it with `gh issue view <number>` to verify it is truly about the same problem. Filter out false positives — issues that merely share keywords but describe different problems.
21+
22+
5. If you find 1-3 genuine duplicates, post the result using the comment script:
23+
```
24+
./scripts/comment-on-duplicates.sh --base-issue <issue-number> --potential-duplicates <dup1> [dup2] [dup3]
25+
```
26+
27+
6. If no genuine duplicates are found, report that no duplicates were detected and take no further action.
28+
29+
Important notes:
30+
- Only flag issues as duplicates when you are confident they describe the **same underlying problem**
31+
- Prefer open issues as duplicates, but closed issues can be referenced too
32+
- Do not flag the issue as a duplicate of itself
33+
- The base issue number is the last part of the issue reference (e.g., for `owner/repo/issues/42`, the number is `42`)
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
name: Auto-close duplicate issues
2+
3+
on:
4+
schedule:
5+
- cron: "0 9 * * *"
6+
workflow_dispatch:
7+
8+
permissions:
9+
issues: write
10+
11+
jobs:
12+
auto-close-duplicates:
13+
runs-on: ubuntu-latest
14+
timeout-minutes: 10
15+
steps:
16+
- name: Close stale duplicate issues
17+
uses: actions/github-script@v7
18+
env:
19+
GRACE_DAYS: ${{ vars.DUPLICATE_GRACE_DAYS || '7' }}
20+
with:
21+
script: |
22+
const { owner, repo } = context.repo;
23+
const graceDays = parseInt(process.env.GRACE_DAYS, 10) || 7;
24+
const GRACE_PERIOD_MS = graceDays * 24 * 60 * 60 * 1000;
25+
const now = Date.now();
26+
27+
// Find all open issues with the duplicate label
28+
const issues = await github.paginate(github.rest.issues.listForRepo, {
29+
owner,
30+
repo,
31+
state: 'open',
32+
labels: 'duplicate',
33+
per_page: 100,
34+
});
35+
36+
console.log(`Found ${issues.length} open issues with duplicate label`);
37+
38+
let closedCount = 0;
39+
40+
for (const issue of issues) {
41+
console.log(`Processing issue #${issue.number}: ${issue.title}`);
42+
43+
// Get comments to find the duplicate detection comment
44+
const comments = await github.rest.issues.listComments({
45+
owner,
46+
repo,
47+
issue_number: issue.number,
48+
per_page: 100,
49+
});
50+
51+
// Find the duplicate detection comment (posted by our script)
52+
const dupeComments = comments.data.filter(c =>
53+
c.body.includes('<!-- duplicate-detection -->')
54+
);
55+
56+
if (dupeComments.length === 0) {
57+
console.log(` No duplicate detection comment found, skipping`);
58+
continue;
59+
}
60+
61+
const lastDupeComment = dupeComments[dupeComments.length - 1];
62+
const dupeCommentAge = now - new Date(lastDupeComment.created_at).getTime();
63+
64+
if (dupeCommentAge < GRACE_PERIOD_MS) {
65+
const daysLeft = ((GRACE_PERIOD_MS - dupeCommentAge) / (24 * 60 * 60 * 1000)).toFixed(1);
66+
console.log(` Duplicate comment is too recent (${daysLeft} days remaining), skipping`);
67+
continue;
68+
}
69+
70+
// Check for human comments after the duplicate detection comment
71+
const humanCommentsAfter = comments.data.filter(c =>
72+
new Date(c.created_at) > new Date(lastDupeComment.created_at) &&
73+
c.user.type !== 'Bot' &&
74+
!c.body.includes('<!-- duplicate-detection -->') &&
75+
!c.body.includes('automatically closed as a duplicate')
76+
);
77+
78+
if (humanCommentsAfter.length > 0) {
79+
console.log(` Has ${humanCommentsAfter.length} human comment(s) after detection, skipping`);
80+
continue;
81+
}
82+
83+
// Check for thumbs-down reaction from the issue author
84+
const reactions = await github.rest.reactions.listForIssueComment({
85+
owner,
86+
repo,
87+
comment_id: lastDupeComment.id,
88+
per_page: 100,
89+
});
90+
91+
const authorThumbsDown = reactions.data.some(r =>
92+
r.user.id === issue.user.id && r.content === '-1'
93+
);
94+
95+
if (authorThumbsDown) {
96+
console.log(` Issue author gave thumbs-down on duplicate comment, skipping`);
97+
continue;
98+
}
99+
100+
// Extract the primary duplicate issue number from the comment
101+
const dupeMatch = lastDupeComment.body.match(/#(\d+)/);
102+
const dupeNumber = dupeMatch ? dupeMatch[1] : 'unknown';
103+
104+
// Close the issue
105+
console.log(` Closing as duplicate of #${dupeNumber}`);
106+
107+
await github.rest.issues.update({
108+
owner,
109+
repo,
110+
issue_number: issue.number,
111+
state: 'closed',
112+
state_reason: 'duplicate',
113+
});
114+
115+
await github.rest.issues.addLabels({
116+
owner,
117+
repo,
118+
issue_number: issue.number,
119+
labels: ['autoclose'],
120+
});
121+
122+
await github.rest.issues.createComment({
123+
owner,
124+
repo,
125+
issue_number: issue.number,
126+
body: `This issue has been automatically closed as a duplicate of #${dupeNumber}.\n\nIf this is incorrect, please reopen this issue or create a new one.\n\n🤖 Generated with [Claude Code](https://claude.ai/code)`,
127+
});
128+
129+
closedCount++;
130+
}
131+
132+
console.log(`Done. Closed ${closedCount} duplicate issue(s).`);
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
name: Claude Issue Dedupe
2+
3+
on:
4+
issues:
5+
types: [opened]
6+
workflow_dispatch:
7+
inputs:
8+
issue_number:
9+
description: 'Issue number to check for duplicates'
10+
required: true
11+
type: string
12+
13+
permissions:
14+
contents: read
15+
issues: write
16+
id-token: write
17+
18+
jobs:
19+
dedupe:
20+
runs-on: ubuntu-latest
21+
timeout-minutes: 10
22+
steps:
23+
- name: Checkout repository
24+
uses: actions/checkout@v4
25+
26+
- name: Configure AWS Credentials (OIDC)
27+
uses: aws-actions/configure-aws-credentials@v4
28+
with:
29+
role-to-assume: ${{ secrets.BEDROCK_ACCESS_ROLE }}
30+
aws-region: us-east-1
31+
32+
- name: Run duplicate detection
33+
uses: anthropics/claude-code-action@v1
34+
env:
35+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
36+
GITHUB_REPOSITORY: ${{ github.repository }}
37+
DUPLICATE_GRACE_DAYS: ${{ vars.DUPLICATE_GRACE_DAYS }}
38+
with:
39+
use_bedrock: "true"
40+
github_token: ${{ secrets.GITHUB_TOKEN }}
41+
allowed_bots: "github-actions[bot]"
42+
prompt: "/dedupe ${{ github.repository }}/issues/${{ github.event.issue.number || inputs.issue_number }}"
43+
claude_args: "--model us.anthropic.claude-sonnet-4-5-20250929-v1:0"
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
name: Remove duplicate label on activity
2+
3+
on:
4+
issue_comment:
5+
types: [created]
6+
7+
permissions:
8+
issues: write
9+
10+
jobs:
11+
remove-duplicate:
12+
if: |
13+
github.event.issue.state == 'open' &&
14+
contains(github.event.issue.labels.*.name, 'duplicate') &&
15+
github.event.comment.user.type != 'Bot'
16+
runs-on: ubuntu-latest
17+
steps:
18+
- name: Remove duplicate label
19+
uses: actions/github-script@v7
20+
with:
21+
script: |
22+
const { owner, repo } = context.repo;
23+
const issueNumber = context.issue.number;
24+
const commenter = context.payload.comment.user.login;
25+
26+
console.log(`Removing duplicate label from issue #${issueNumber} due to comment from ${commenter}`);
27+
28+
try {
29+
await github.rest.issues.removeLabel({
30+
owner,
31+
repo,
32+
issue_number: issueNumber,
33+
name: 'duplicate',
34+
});
35+
console.log(`Successfully removed duplicate label from issue #${issueNumber}`);
36+
} catch (error) {
37+
if (error.status === 404) {
38+
console.log(`duplicate label was already removed from issue #${issueNumber}`);
39+
} else {
40+
throw error;
41+
}
42+
}

scripts/comment-on-duplicates.sh

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
#!/bin/bash
2+
#
3+
# Copyright OpenSearch Contributors
4+
# SPDX-License-Identifier: Apache-2.0
5+
#
6+
# Posts a formatted duplicate detection comment and adds the duplicate label.
7+
#
8+
# Usage:
9+
# ./scripts/comment-on-duplicates.sh --base-issue 123 --potential-duplicates 456 789
10+
11+
set -euo pipefail
12+
13+
REPO="${GITHUB_REPOSITORY:-}"
14+
BASE_ISSUE=""
15+
DUPLICATES=()
16+
17+
while [[ $# -gt 0 ]]; do
18+
case $1 in
19+
--base-issue)
20+
BASE_ISSUE="$2"
21+
shift 2
22+
;;
23+
--potential-duplicates)
24+
shift
25+
while [[ $# -gt 0 && ! "$1" =~ ^-- ]]; do
26+
DUPLICATES+=("$1")
27+
shift
28+
done
29+
;;
30+
*)
31+
echo "Unknown argument: $1" >&2
32+
exit 1
33+
;;
34+
esac
35+
done
36+
37+
if [[ -z "$BASE_ISSUE" ]]; then
38+
echo "Error: --base-issue is required" >&2
39+
exit 1
40+
fi
41+
42+
if [[ ${#DUPLICATES[@]} -eq 0 ]]; then
43+
echo "Error: --potential-duplicates requires at least one issue number" >&2
44+
exit 1
45+
fi
46+
47+
REPO_FLAG=()
48+
if [[ -n "$REPO" ]]; then
49+
REPO_FLAG=("--repo" "$REPO")
50+
fi
51+
52+
# Build duplicate list
53+
DUP_LIST=""
54+
for dup in "${DUPLICATES[@]}"; do
55+
TITLE=$(gh issue view "$dup" "${REPO_FLAG[@]}" --json title -q .title 2>/dev/null || echo "")
56+
if [[ -n "$TITLE" ]]; then
57+
DUP_LIST+="- #${dup}${TITLE}"$'\n'
58+
else
59+
DUP_LIST+="- #${dup}"$'\n'
60+
fi
61+
done
62+
63+
# Build the comment body with a hidden marker for auto-close detection
64+
BODY="<!-- duplicate-detection -->
65+
### Possible Duplicate
66+
67+
Found **${#DUPLICATES[@]}** possible duplicate issue(s):
68+
69+
${DUP_LIST}
70+
If this is **not** a duplicate:
71+
- Add a comment on this issue, and the \`duplicate\` label will be removed automatically, or
72+
- 👎 this comment to prevent auto-closure
73+
74+
Otherwise, this issue will be **automatically closed in ${DUPLICATE_GRACE_DAYS:-7} days**.
75+
76+
🤖 Generated with [Claude Code](https://claude.ai/code)"
77+
78+
# Post the comment
79+
echo "$BODY" | gh issue comment "$BASE_ISSUE" "${REPO_FLAG[@]}" --body-file -
80+
81+
# Ensure the duplicate label exists
82+
gh label create "duplicate" \
83+
--description "Issue is a duplicate of an existing issue" \
84+
--color "cccccc" \
85+
"${REPO_FLAG[@]}" 2>/dev/null || true
86+
87+
# Add duplicate label
88+
gh issue edit "$BASE_ISSUE" "${REPO_FLAG[@]}" --add-label "duplicate"
89+
90+
echo "Posted duplicate comment and added duplicate label to issue #${BASE_ISSUE}"

0 commit comments

Comments
 (0)