Merge latest changes from topic-release to main#216
Conversation
Added new Lambdafunction.yml to support Lambda function deployment for egress workflow
…to support new products fix(bootstrap): restore S3 mounts check in bootstrap script
feat(settings): added settings configuration in settings-config.json to support new products
…ation and error handling
feat(catalog): remove deprecated catalog items and update Amazon Sagemaker name
…ttempts are being made.
…eploy into topic-release
… the command to pull the latest version of FUSE, and add permission to describe VPCs in the project account Role.
Update the base account policy to modify the S3 bucket policy, update the command to pull the latest version of FUSE, and add permission to describe VPCs in the project account Role.
Fix: not able to attach cross account policy to rg portal role.
…arameters; enhance bootstrap script for environment instance setup.
Update pcs_quick_lt.yaml: Add new IAM policy document and S3 mounts parameters; enhance bootstrap script for environment instance setup.
Merge updates from main to topic-release
|
Warning Review limit reached
More reviews will be available in 1 minute and 35 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR expands the RGDeploy platform with multiple new cloud services: an S3-based Lambda file archival system, EC2 remote desktop and HPC cluster provisioning, SageMaker domain/user profile setup, and Bedrock AI integration. Updates include CloudFormation templates, Lambda functions, configuration registrations, and deployment automation. ChangesMulti-Service Platform Expansion
Sequence Diagram(s) sequenceDiagram
participant Client
participant LambdaURL as Egress Lambda Function URL
participant EgressLambda as EgressLambda
participant S3 as S3
Client->>LambdaURL: POST {sourceBucket, sourcePrefix, destinationBucket, destinationPrefix}
LambdaURL->>EgressLambda: Invoke (IAM auth)
EgressLambda->>S3: ListObjectsV2(sourceBucket, sourcePrefix)
alt no objects
EgressLambda-->>Client: 404 Not Found
else objects found
EgressLambda->>EgressLambda: Create ZIP (archiver)
EgressLambda->>S3: Upload ZIP stream to destinationBucket/destinationPrefix
EgressLambda-->>Client: 201 Created
end
🎯 4 (Complex) | ⏱️ ~75 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
config/settings-config.json (1)
1502-1524:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd config blocks for the newly listed services.
servicesnow includesec2-spyderandsagemaker-studio, but this file has no matching top-level definitions for either key. Any lookup by service name will resolve toundefined, so those products will not get action/state mappings in the UI.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@config/settings-config.json` around lines 1502 - 1524, The services array now lists "ec2-spyder" and "sagemaker-studio" but there are no corresponding top-level config objects for those service keys; add two new top-level entries named "ec2-spyder" and "sagemaker-studio" that mirror the structure used by the other service configs (include the same action/state mappings, displayName, icon/key, and any required UI metadata) so lookups by service name return the expected action/state mappings; ensure the new blocks follow the same property names and schema as existing entries (e.g., how "ec2" or "sagemaker" are defined) so the UI can consume them.SRE/Egress/Setup-templates/egressresources.yml (1)
27-38:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAllow SNS to use the customer-managed KMS key.
In
SRE/Egress/Setup-templates/egressresources.yml,EgressNotificationTopicis encrypted withEgressStoreEncryptionKey, but the CMK key policy currently grants only the account root principal. For SNS-encrypted topics, thesns.amazonaws.comservice principal must be allowedkms:GenerateData*(viakms:GenerateDataKey*) andkms:Decryptso publishes can succeed (scoped to the specific topic via aCondition).Suggested fix
KeyPolicy: Version: "2012-10-17" Id: egress-store-kms-policy Statement: - Sid: Allow root Effect: Allow Principal: AWS: - !Sub "arn:aws:iam::${AWS::AccountId}:root" Action: - "kms:*" Resource: "*" + - Sid: AllowSnsToUseKeyForEgressTopic + Effect: Allow + Principal: + Service: sns.amazonaws.com + Action: + - "kms:Decrypt" + - "kms:GenerateDataKey*" + Resource: "*" + Condition: + StringEquals: + aws:SourceAccount: !Ref AWS::AccountId + ArnLike: + aws:SourceArn: !Ref EgressNotificationTopic🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@SRE/Egress/Setup-templates/egressresources.yml` around lines 27 - 38, The KMS key policy (KeyPolicy on EgressStoreEncryptionKey) only allows the account root; add a new Statement permitting the sns.amazonaws.com service principal to call kms:GenerateDataKey* and kms:Decrypt for the EgressNotificationTopic. Specifically, add a Statement under KeyPolicy that sets Principal.Service to "sns.amazonaws.com", Action to ["kms:GenerateDataKey*", "kms:Decrypt"], Effect to Allow, and include a Condition that scopes the permission to the specific topic (e.g., StringEquals on the SNS encryption context or aws:SourceArn referencing EgressNotificationTopic) so publishes to EgressNotificationTopic succeed while remaining scoped to that topic.
🟠 Major comments (23)
cft-templates/bedrock.yaml-64-76 (1)
64-76:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftSecurity: IAM role creation and PassRole permissions enable privilege escalation.
The combination of
iam:CreateRole,iam:AttachRolePolicy,iam:PutRolePolicy, andiam:PassRoleonResource: "*"allows this role to create new IAM roles with arbitrary permissions and pass them to AWS services. This is a privilege escalation risk.If Bedrock Agents or Knowledge Bases require creating service-linked roles, scope these permissions down to specific resource patterns (e.g.,
arn:aws:iam::${AWS::AccountId}:role/service-role/AmazonBedrock*). If these permissions are not required, remove them entirely.🔒 Recommended fix to scope IAM permissions
- Effect: Allow Action: - "iam:CreateRole" - "iam:AttachRolePolicy" - "iam:PutRolePolicy" + - "iam:GetRole" + - "iam:PassRole" + - "iam:ListAttachedRolePolicies" + Resource: + - !Sub "arn:aws:iam::${AWS::AccountId}:role/service-role/AmazonBedrock*" + - Effect: Allow + Action: - "lambda:ListFunctions" - "lambda:GetFunction" - "lambda:InvokeFunction" - - "iam:CreateRole" - - "iam:GetRole" - - "iam:PassRole" - - "iam:ListAttachedRolePolicies" Resource: "*"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/bedrock.yaml` around lines 64 - 76, The IAM statement currently grants iam:CreateRole, iam:AttachRolePolicy, iam:PutRolePolicy, iam:PassRole (plus related iam:GetRole, iam:ListAttachedRolePolicies and lambda actions) with Resource: "*", which allows privilege escalation; update the policy so that iam:CreateRole/AttachRolePolicy/PutRolePolicy/PassRole are either removed if not required or scoped to a narrow ARN pattern (for example only service-linked Bedrock roles such as arn:aws:iam::${AWS::AccountId}:role/service-role/AmazonBedrock*), keep read-only actions (iam:GetRole, iam:ListAttachedRolePolicies, lambda:ListFunctions, lambda:GetFunction, lambda:InvokeFunction) as needed, and ensure Resource entries reference the specific role ARNs rather than "*" to eliminate broad role creation/passing rights.cft-templates/bedrock.yaml-29-34 (1)
29-34:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSecurity: Allowing account root to assume role is overly permissive.
Allowing the account root principal (
arn:aws:iam::${AWS::AccountId}:root) to assume this role means any IAM identity in the account withsts:AssumeRolepermission can assume this role. This violates the principle of least privilege and could lead to unauthorized access.If cross-account or cross-service access is not required, consider restricting the trust policy to only
bedrock.amazonaws.com. If specific IAM principals need access, explicitly list them instead of using the root principal.🛡️ Recommended fix to remove root principal
AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Principal: Service: bedrock.amazonaws.com Action: sts:AssumeRole - - Effect: Allow - Principal: - AWS: !Sub "arn:aws:iam::${AWS::AccountId}:root" - Action: - - sts:AssumeRole - - sts:TagSession🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/bedrock.yaml` around lines 29 - 34, The trust policy currently allows the account root principal ("arn:aws:iam::${AWS::AccountId}:root") to assume the role which is overly permissive; update the Principal section of the role's trust policy to remove that root ARN and instead specify a narrow principal such as Service: "bedrock.amazonaws.com" if only Bedrock needs access, or explicitly list the specific IAM roles/users that require sts:AssumeRole and sts:TagSession; locate the Principal block and the Action entries (sts:AssumeRole, sts:TagSession) in the bedrock.yaml trust policy and replace the AWS root principal with the appropriate Service principal or an explicit list of allowed IAM principals.cft-templates/bedrock.yaml-36-37 (1)
36-37:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftSecurity: AmazonS3FullAccess grants unrestricted access to all S3 buckets.
The
AmazonS3FullAccessmanaged policy grants full access to all S3 buckets and objects in the account. This is a significant security risk and violates the principle of least privilege. Bedrock workloads typically require access to specific S3 buckets for data sources, model artifacts, or outputs—not all buckets.Replace this with a custom inline policy that grants access only to the specific S3 buckets needed for Bedrock operations.
🔒 Recommended fix with scoped S3 permissions
ManagedPolicyArns: - arn:aws:iam::aws:policy/AmazonBedrockFullAccess - - arn:aws:iam::aws:policy/AmazonS3FullAccess Policies: - PolicyName: BedrockExecutionRolePolicy PolicyDocument: Version: "2012-10-17" Statement: + - Effect: Allow + Action: + - "s3:GetObject" + - "s3:PutObject" + - "s3:DeleteObject" + Resource: + - "arn:aws:s3:::your-bedrock-data-bucket/*" + - "arn:aws:s3:::your-bedrock-model-bucket/*" + - Effect: Allow + Action: + - "s3:ListBucket" + Resource: + - "arn:aws:s3:::your-bedrock-data-bucket" + - "arn:aws:s3:::your-bedrock-model-bucket" - Effect: AllowReplace
your-bedrock-data-bucketandyour-bedrock-model-bucketwith the actual bucket names or use parameters to make this configurable.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/bedrock.yaml` around lines 36 - 37, Replace the overly broad managed policy arn:aws:iam::aws:policy/AmazonS3FullAccess in the bedrock.yaml policy list with a scoped inline IAM policy that grants least-privilege S3 actions only on the specific buckets used by Bedrock (e.g., GetObject, PutObject, ListBucket on your-bedrock-data-bucket and your-bedrock-model-bucket), keep the existing AmazonBedrockFullAccess entry, and make bucket names configurable via CloudFormation parameters; update the role or policy resource (the location where AmazonS3FullAccess is referenced) to attach this inline policy instead of the managed ARN.dump/configs.json-596-605 (1)
596-605:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove the duplicated
chatAssistantConfigseed record.This block repeats the same
_idandkeythat already appear earlier in the dump. Re-importing this file will either fail with a duplicate-key error or leave seed state dependent on import behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dump/configs.json` around lines 596 - 605, Remove the duplicated seed document for the config with key "chatAssistantConfig" and _id "663b604476964689564dc998" from the JSON dump so only one instance remains; locate the object containing "key": "chatAssistantConfig" (and the matching "$oid": "663b604476964689564dc998") and delete that entire object (and any trailing comma it leaves) to avoid duplicate-key import errors.dump/configs.json-606-612 (1)
606-612:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse the standard
valuefield name forkmsKeys.This document stores its payload under
Value, but the rest of the config dump usesvalue. If the loader readsvalue, this new setting will be ignored at runtime.Suggested fix
- "Value": [ ] + "value": []🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dump/configs.json` around lines 606 - 612, The document for key "kmsKeys" uses the nonstandard field name "Value" instead of the expected "value", so the config loader will ignore it; update the document payload for the "kmsKeys" entry to use the lowercase "value" field (or add a "value" alias mirroring "Value") so the loader finds the setting at runtime.deploy.sh-228-308 (1)
228-308:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate bucket contents before honoring
skip_s3_copy.Skipping this block on a fresh or partially staged bucket leaves the templates, lambdas, bootstrap scripts, and seed data absent, but the rest of the deploy still assumes they exist. That turns the flag into a late deployment failure instead of a safe fast-path for repeat runs.
Suggested guard
if ! [ "$SKIP_S3_COPY" = "true" ]; then echo "Copying deployment files to bucket $bucketname" ... else - echo "Skipping S3 copy as per user request" + required_objects=( + docker-compose.yml + nginx.conf + updatescripts.sh + config.tar.gz + scripts.tar.gz + dump.zip + ) + for object in "${required_objects[@]}"; do + aws s3api head-object --bucket "$bucketname" --key "$object" >/dev/null 2>&1 || { + echo "Cannot skip S3 copy: s3://$bucketname/$object is missing." + exit 1 + } + done + echo "Skipping S3 copy as per user request" fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy.sh` around lines 228 - 308, When SKIP_S3_COPY is true the script currently skips uploading but does not validate that required objects exist in the target bucket; add a validation step in the else branch that checks for the presence of all required S3 objects (use bucketname and the same object keys referenced in the upload section: docker-compose.yml, nginx.conf, updatescripts.sh, cft-templates/ (sync target), config.tar.gz or config/*, scripts.tar.gz, bootstrap-scripts/, the lambda zips, product tarballs and dump.zip) by calling aws s3 ls or aws s3api head-object for each key; if any are missing, print a clear error and exit non‑zero instructing the user to run without SKIP_S3_COPY or re-run the upload so downstream steps relying on those assets don’t fail.upload-assets.sh-62-76 (1)
62-76:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftUploads lack error handling—partial failures go unnoticed.
If any
aws s3 cporaws s3 synccommand fails (e.g., network timeout, permissions), the script continues and exits with success. This could leave the bucket in an inconsistent state.Consider adding
set -enear the top, or appending|| exit 1to critical upload commands, or wrapping uploads in a function that checks return codes.Example: add errexit at script start
#!/bin/bash +set -euo pipefail # Verifying required utilitiesThis causes the script to exit immediately on any command failure.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@upload-assets.sh` around lines 62 - 76, The script performs many uploads (aws s3 cp, aws s3 sync) and tar operations without checking failures, so partial failures are possible; modify the script to fail-fast by enabling errexit (add set -euo pipefail near the top) or wrap critical operations (aws s3 cp, aws s3 sync, tar -czf, and rm -f) in a helper function that checks the command exit status and exits with a non‑zero code on failure (reference the aws s3 cp/aws s3 sync invocations and the tar -czf config.tar.gz / tar -tf / rm -f sequence and variables localhome and bucketname when locating where to apply the checks).upload-assets.sh-50-56 (1)
50-56:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRedundant wait after
cloudformation deploy.
aws cloudformation deployis synchronous by default—it already waits for the stack operation to complete before returning. The subsequentwait stack-create-completeis redundant and can fail if the stack was updated rather than created (wrong waiter). Check the deploy exit code directly instead.Proposed fix
echo "Bucket $bucketname does not exist. Creating it using CloudFormation..." - aws cloudformation deploy --template-file rg_deploy_bucket.yml --stack-name "$bucketstackname" \ - --parameter-overrides S3NewBucketName="$bucketname" - aws cloudformation wait stack-create-complete --stack-name "$bucketstackname" - if [ $? -ne 0 ]; then + if ! aws cloudformation deploy --template-file rg_deploy_bucket.yml --stack-name "$bucketstackname" \ + --parameter-overrides S3NewBucketName="$bucketname"; then echo "Error: Failed to create S3 bucket via CloudFormation." exit 1 fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@upload-assets.sh` around lines 50 - 56, The script currently calls "aws cloudformation deploy" then redundantly runs "aws cloudformation wait stack-create-complete" which can be incorrect for updates; remove the waiter call and instead check the exit status of the deploy command (the "aws cloudformation deploy --template-file rg_deploy_bucket.yml --stack-name \"$bucketstackname\" --parameter-overrides S3NewBucketName=\"$bucketname\"" line). After the deploy command, test its exit code (e.g. if [ $? -ne 0 ] || use "|| exit 1") and emit the existing error message/exit if it failed; delete the "aws cloudformation wait stack-create-complete --stack-name \"$bucketstackname\"" block and its associated if check.cft-templates/ec2-nginxWebServer.yml-85-98 (1)
85-98:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMissing
CreationPolicyrenderscfn-signalineffective.The EC2 instance calls
cfn-signalat line 98, but there is noCreationPolicydefined on the resource. Without it, CloudFormation will not wait for the signal and will mark the stack as complete before the bootstrap script finishes. Compare withec2-rockyLinux.ymlwhich correctly defines aCreationPolicywithResourceSignal.Proposed fix to add CreationPolicy
EC2Instance: Type: AWS::EC2::Instance + CreationPolicy: + ResourceSignal: + Timeout: PT15M Properties: UserData:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/ec2-nginxWebServer.yml` around lines 85 - 98, The EC2Instance resource is missing a CreationPolicy so the /opt/aws/bin/cfn-signal call in the UserData will be ignored; add a CreationPolicy to the EC2Instance resource with a ResourceSignal block (e.g., set Count to 1 and an appropriate Timeout like PT15M) so CloudFormation waits for the cfn-signal from the instance; update the EC2Instance resource's properties to include CreationPolicy and ResourceSignal to match the pattern used in ec2-rockyLinux.yml.cft-templates/ec2-rockyLinux.yml-62-76 (1)
62-76:⚠️ Potential issue | 🟠 Major | ⚖️ Poor tradeoffOverly permissive SSM and KMS policy should be scoped down.
The policy grants
ssm:GetParameter,ssm:PutParameter, and KMS actions onResource: "*". This allows the instance to read/write any SSM parameter and use any KMS key in the account. The instance only needs access to/RL/RG/nice-dcv/auth-token/*parameters.Suggested scope for SSM resources
- Sid: AllowSSMParamActions Effect: Allow Action: - ssm:GetParameter - ssm:PutParameter - ssm:DescribeParameters - Resource: "*" + Resource: !Sub "arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:parameter/RL/RG/nice-dcv/auth-token/*"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/ec2-rockyLinux.yml` around lines 62 - 76, The policy statements Sid: AllowSSMParamActions and Sid: AllowAccessToEncryptionKeys are overly permissive (Resource: "*"); restrict Sid: AllowSSMParamActions to the exact SSM parameter ARNs for the instance (e.g. arn:aws:ssm:<region>:<account-id>:parameter/RL/RG/nice-dcv/auth-token/*) instead of "*", and restrict Sid: AllowAccessToEncryptionKeys to only the specific KMS key ARN(s) used to encrypt those parameters (replace "*" with arn:aws:kms:<region>:<account-id>:key/<key-id>), keeping necessary actions (kms:Decrypt, kms:GenerateDataKey, kms:DescribeKey) but scoped to that key; ensure the parameter ARNs and key ARNs use the correct region and account values for your deployment.cft-templates/windows-remotedesktop.yml-71-91 (1)
71-91:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMissing
CreationPolicyrenderscfn-signalineffective.The EC2 instance calls
cfn-signal.exeat line 87, but there is noCreationPolicydefined. CloudFormation will mark the stack as complete immediately after launching the instance, without waiting for the user data script to finish.Proposed fix to add CreationPolicy
EC2Instance: Type: AWS::EC2::Instance + CreationPolicy: + ResourceSignal: + Timeout: PT15M Properties: UserData:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/windows-remotedesktop.yml` around lines 71 - 91, The EC2Instance resource uses cfn-signal but lacks a CreationPolicy so CloudFormation won't wait for the signal; add a CreationPolicy block to the EC2Instance resource with a ResourceSignal specifying Count: 1 and an appropriate Timeout (e.g. PT15M) so the stack waits for the cfn-signal.exe call, and ensure the signal's --resource value matches the EC2Instance logical name used in the template.cft-templates/windows-remotedesktop.yml-41-44 (1)
41-44:⚠️ Potential issue | 🟠 Major | ⚡ Quick winOverly broad SSM permissions grant full Systems Manager access.
The policy grants
ssm:*on all resources, which allows the instance to perform any SSM action across the entire account. This violates the principle of least privilege. Scope the permissions to only the required actions and resources.Suggested fix to scope permissions
PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: - - ssm:* - Resource: "*" + - ssm:GetParameter + - ssm:PutParameter + Resource: !Sub "arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:parameter/RL/RG/nice-dcv/auth-token/*"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/windows-remotedesktop.yml` around lines 41 - 44, The IAM policy currently grants overly broad Systems Manager permissions via Action: "ssm:*" and Resource: "*" — replace this with a scoped set of required SSM actions (e.g., "ssm:SendCommand", "ssm:GetCommandInvocation", "ssm:DescribeInstanceInformation", "ssm:StartSession" only if needed) and narrow Resource ARNs instead of "*", or restrict to the specific instance(s) by using the instance profile/instance ARN or parameter ARNs (e.g., "arn:aws:ec2:...:instance/INSTANCE_ID" or "arn:aws:ssm:REGION:ACCOUNT_ID:parameter/...") so the Role only has least-privilege access rather than "ssm:*" on all resources.cft-templates/pcs_quick_lt.yaml-68-86 (1)
68-86:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHardcoded VPC, subnet, and security group IDs will fail in other accounts.
Parameters
VPC,DefaultPrivateSubnet,SecurityGroup,DefaultPublicSubnet, andVpcDefaultSecurityGroupIdhave account-specific default values that won't exist in other AWS accounts or regions.Proposed fix - Remove defaults or make required
VPC: Description: VPC to launch the Cluster nodes Type: String - Default: vpc-0e80ca11ecebc561d + AllowedPattern: "^vpc-[a-z0-9]+$" DefaultPrivateSubnet: Description: Private subnet for the PCluster Type: String - Default: subnet-005ed22f10a495974 + AllowedPattern: "^subnet-[a-z0-9]+$"Apply similar changes to
SecurityGroup,DefaultPublicSubnet, andVpcDefaultSecurityGroupId.Also applies to: 112-115
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/pcs_quick_lt.yaml` around lines 68 - 86, The CloudFormation template currently hardcodes account-specific defaults for parameters VPC, DefaultPrivateSubnet, SecurityGroup, DefaultPublicSubnet, and VpcDefaultSecurityGroupId; remove these static Default values (or mark the parameters as Required/no Default) so the template doesn't assume specific AWS resource IDs and will fail in other accounts/regions—update the parameter blocks for VPC, DefaultPrivateSubnet, SecurityGroup, DefaultPublicSubnet, and VpcDefaultSecurityGroupId to omit the Default field (or set no Default and ensure callers supply values) and adjust any code that expects defaults to validate required inputs.cft-templates/pcs_quick_lt.yaml-507-507 (1)
507-507:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHardcoded AMI IDs are region-specific and not configurable.
Both node groups use
ami-08608e2b2243c1f1bwhich is region-specific and will fail in other regions. Unlikepcs.yaml, this template lacks anAmiIdparameter for override.Proposed fix - Add AmiId parameter and use it
Add to Parameters section:
AmiId: Type: String Description: AMI ID to use for PCS cluster nodesThen update node groups:
- AmiId: ami-08608e2b2243c1f1b + AmiId: !Ref AmiIdAlso applies to: 528-528
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/pcs_quick_lt.yaml` at line 507, Add a new CloudFormation parameter named AmiId (Type: String, Description: AMI ID to use for PCS cluster nodes) and replace the hardcoded ami-08608e2b2243c1f1b occurrences in both node group definitions with a reference to that parameter (use !Ref AmiId or {"Ref":"AmiId"} in the AmiId property of the node group resources) so the AMI is configurable per-deployment.cft-templates/pcs.yaml-546-546 (1)
546-546:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHardcoded AMI IDs ignore the
AmiIdparameter.The
AmiIdparameter is defined (line 139) but bothPCSNodeGroupLoginandPCSNodeGroupComputeuse a hardcoded AMI ID (ami-0bf564070da947c48). This AMI is region-specific and will fail in other regions.Proposed fix
PCSNodeGroupLogin: Type: AWS::PCS::ComputeNodeGroup Properties: ... - AmiId: ami-0bf564070da947c48 + AmiId: !Ref AmiId InstanceConfigs:Apply the same change to
PCSNodeGroupComputeat line 567.Also applies to: 567-567
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/pcs.yaml` at line 546, Both node group resources are using a hardcoded AMI ID instead of the template parameter; update the EC2/AutoScaling/LaunchTemplate property for PCSNodeGroupLogin and PCSNodeGroupCompute to reference the template parameter AmiId (not the literal "ami-0bf564070da947c48") so the AMI is region-agnostic; replace the hardcoded value with a CloudFormation parameter reference (using Ref or !Ref to AmiId) in the LaunchTemplate/InstanceLaunchTemplate/LaunchTemplateData/ImageId (or equivalent ImageId field) for both PCSNodeGroupLogin and PCSNodeGroupCompute.cft-templates/pcs.yaml-194-197 (1)
194-197:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHardcoded security group ID will fail in other AWS accounts.
The default value
sg-00db64cee7ba85cb2is account-specific and will cause stack creation failures when deployed to different accounts or environments.Proposed fix
VpcDefaultSecurityGroupId: Type: AWS::EC2::SecurityGroup::Id Description: Cluster VPC 'default' security group. Make sure you choose the one from your cluster VPC! - Default: sg-00db64cee7ba85cb2 # Default from main stack's SecurityGroup paramRemove the default or use a dynamic lookup via a Lambda-backed custom resource.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/pcs.yaml` around lines 194 - 197, The parameter VpcDefaultSecurityGroupId is hardcoded to an account-specific value; remove the Default: sg-00db64cee7ba85cb2 and either (A) make VpcDefaultSecurityGroupId a required parameter with no Default so callers supply the correct SG for their account, or (B) implement a Lambda-backed custom resource that runs DescribeSecurityGroups filtered by VpcId and group-name "default" and returns the SecurityGroupId, then have the template set VpcDefaultSecurityGroupId from that custom resource output. Ensure the custom resource Lambda uses the cluster VpcId input (or the existing Vpc parameter) and expose the resulting value for any resources that reference VpcDefaultSecurityGroupId.cft-templates/pcs_quick_lt.yaml-302-308 (1)
302-308:⚠️ Potential issue | 🟠 Major | ⚡ Quick winOverly permissive KMS policy grants full access to all keys in the account.
kms:*onResource: "*"allows instances to decrypt, delete, or modify any KMS key in the account. This violates least-privilege and poses a significant security risk if an instance is compromised.Proposed fix - Scope to required actions
- PolicyDocument: Version: "2012-10-17" Statement: - - Action: "kms:*" + - Action: + - kms:Decrypt + - kms:GenerateDataKey + - kms:DescribeKey Effect: Allow - Resource: "*" + Resource: !Sub "arn:aws:kms:${AWS::Region}:${AWS::AccountId}:key/*" PolicyName: PcsKmsFullAccessPolicyScope to only the required KMS actions (e.g.,
Decrypt,GenerateDataKey) and restrict to specific key ARNs if possible.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/pcs_quick_lt.yaml` around lines 302 - 308, The PcsKmsFullAccessPolicy currently grants overly broad permissions (PolicyDocument with Action "kms:*" and Resource "*"); replace this with a least-privilege policy that lists only required KMS actions (e.g., "kms:Decrypt", "kms:GenerateDataKey", "kms:Encrypt" as needed) and narrow the Resource to the specific KMS key ARNs used by this stack (or use a parameter like PcsKmsKeyArn) instead of "*"; update the PolicyName block to reference the tightened PolicyDocument and ensure any callers (roles or instance profiles) still reference PcsKmsFullAccessPolicy after the change.cft-templates/sagemakerUserProfile.yaml-323-338 (1)
323-338:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDelete flow will timeout if app/space doesn't exist or is already deleted.
The
wait_for_statusfunction catchesClientError(includingResourceNotFound) but continues polling indefinitely. When waiting for"Deleted"status during delete, if the resource is already gone,describe_app/describe_spacewill throwResourceNotFound, causing a 600-second timeout instead of succeeding immediately.🐛 Proposed fix to handle ResourceNotFound as deleted
def wait_for_status(describe_fn, status_key, desired, max_wait=600, poll_interval=10, **kwargs): """Generic waiter for any SageMaker resource status""" start = time.time() while time.time() - start < max_wait: try: resp = describe_fn(**kwargs) status = resp[status_key] print(f"Current {status_key}: {status}") if status == desired: return True if status == "Failed": raise Exception(f"Resource failed with status 'Failed'") except ClientError as e: + if e.response["Error"]["Code"] == "ResourceNotFound": + if desired == "Deleted": + print("Resource not found - treating as deleted") + return True + raise print(f"Error while waiting: {e}") time.sleep(poll_interval) raise Exception(f"Timeout waiting for {status_key}={desired}")Also applies to: 431-435
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/sagemakerUserProfile.yaml` around lines 323 - 338, The waiter wait_for_status currently swallows ClientError and keeps polling, causing delete flows to timeout when describe_* raises ResourceNotFound; update wait_for_status (the function named wait_for_status used by describe_app/describe_space callers) to detect ClientError with Error Code "ResourceNotFound" and, if the desired status is "Deleted", return True immediately; for other ClientErrors preserve existing behavior (log and continue or re-raise as appropriate) so only ResourceNotFound during a delete returns success.cft-templates/sagemakerUserProfile.yaml-387-402 (1)
387-402:⚠️ Potential issue | 🟠 Major | ⚡ Quick winApp creation is not idempotent—Update will fail if app exists.
The Space creation checks for existence first (line 366), but app creation unconditionally calls
create_appwithout checking ifdefaultapp already exists. On CloudFormation Update, this will fail withResourceInUseException.🐛 Proposed fix to add idempotent app creation
# --- Create App --- + try: + sm.describe_app( + DomainId=domain_id, + SpaceName=space_name, + AppType="JupyterLab", + AppName="default" + ) + print("App 'default' already exists") + except ClientError as e: + if e.response["Error"]["Code"] == "ResourceNotFound": print(f"Creating JupyterLab app 'default' with tags") sm.create_app( DomainId=domain_id, SpaceName=space_name, AppType="JupyterLab", AppName="default", ResourceSpec={ "InstanceType": instance_type, "SageMakerImageArn": f"arn:aws:sagemaker:{region}:081325390199:image/sagemaker-distribution-cpu" }, Tags=[ *common_tags, {"Key": "InstanceType", "Value": instance_type} ] ) + else: + raise🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/sagemakerUserProfile.yaml` around lines 387 - 402, The app creation is not idempotent: before calling sm.create_app(...) for AppName="default" (using DomainId=domain_id, SpaceName=space_name, AppType="JupyterLab"), first check whether that app already exists (e.g., call sm.list_apps or sm.describe_app filtered by DomainId/SpaceName/AppName/AppType) and only call create_app if it is not present; alternatively catch ResourceInUseException around sm.create_app and treat it as success (log and continue). Update the logic around the create_app call so it performs the existence check or exception handling to make app creation idempotent.SRE/lambdaresources/package.json-5-5 (1)
5-5:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix
package.jsonentrypoint (main) to match existing file
SRE/lambdaresources/package.jsonsets"main": "index.mjs.mjs", butSRE/lambdaresources/only containsindex.mjs(andstartrunsnode index.mjs), so the manifest entrypoint is inconsistent and likely breaks tooling that readsmain.Suggested fix
- "main": "index.mjs.mjs", + "main": "index.mjs",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@SRE/lambdaresources/package.json` at line 5, The package.json "main" field is incorrect ("index.mjs.mjs") and must point to the actual entry file; open package.json and update the "main" property to "index.mjs" so it matches the existing file and the start script that runs node index.mjs; verify no other references expect the erroneous name.SRE/lambdaresources/index.mjs-33-35 (1)
33-35:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon’t throw from the
archiver'error'listener; propagate via a Promise.The
throwinsidearchive.on('error', ...)won’t be reliably caught by the surrounding async handler’stry/catch, so failures may bypass your500response (or crash with an unhandled error). Reject a Promise from the listener andawaitit alongside the upload.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@SRE/lambdaresources/index.mjs` around lines 33 - 35, The archiver error handler currently throws inside archive.on('error', ...) which can escape the async try/catch; instead create a Promise (e.g., const archiveErrorPromise = new Promise((_, reject) => archive.on('error', reject))) and reject from the listener, then await that Promise together with the existing upload promise (or use Promise.race/Promise.all) in the async flow so archiver errors are propagated as rejections you can catch; update references to archive.on('error', ...) and any upload awaiting logic to await the new archiveErrorPromise.SRE/lambdaresources/index.mjs-18-26 (1)
18-26:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPaginate
listObjectsV2results and guard for missingContentsbefore building the ZIP.
listObjectsV2returns up to 1,000 keys per call; without pagination (IsTruncated/NextContinuationToken), objects beyond the first page won’t be added. Also, on empty prefixesContentscan be missing, solistedObjects.Contents.lengthand iteratinglistedObjects.Contentscan fail.Suggested fix
const listParams = { Bucket: sourceBucket, Prefix: sourcePrefix }; - const listedObjects = await s3.listObjectsV2(listParams).promise(); + const contents = []; + let continuationToken; + + do { + const page = await s3.listObjectsV2({ + ...listParams, + ContinuationToken: continuationToken + }).promise(); + contents.push(...(page.Contents ?? [])); + continuationToken = page.IsTruncated ? page.NextContinuationToken : undefined; + } while (continuationToken); - if (listedObjects.Contents.length === 0) { + if (contents.length === 0) { console.log('No files found in the source folder'); return { statusCode: 404, body: 'No files found in the source folder' }; } @@ - for (const obj of listedObjects.Contents) { + for (const obj of contents) { const fileKey = obj.Key;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@SRE/lambdaresources/index.mjs` around lines 18 - 26, The code calling s3.listObjectsV2 (using listParams/listedObjects) must be converted to a paginated loop that follows IsTruncated/NextContinuationToken and accumulates all returned objects into a single array before building the ZIP; also guard for missing or undefined listedObjects.Contents by treating it as an empty array (e.g., default to []) and only proceeding to ZIP if the accumulated array length > 0. Update the block that checks listedObjects.Contents.length and the iteration over listedObjects.Contents to use the accumulated results and handle the case where Contents is absent.SRE/lambdaresources/index.mjs-7-13 (1)
7-13:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard request parsing and fix pagination + archiver error propagation
- Move
JSON.parse(event.body)and required-field validation inside thetryblock so malformed/missing bodies return a controlled400instead of failing before error handling.listObjectsV2is only called once; if there are more than one page of keys, the code will zip an incomplete set—paginate usingIsTruncated+NextContinuationToken.- Throwing inside
archive.on('error', ...)won’t be reliably caught by the outertry/catch; route theerrorevent into the awaited Promise path (e.g., reject a wrapper Promise).Suggested fix
export const handler = async (event) => { - const reqBody = JSON.parse(event.body); - const sourceBucket = reqBody.sourceBucket; - const sourcePrefix = reqBody.sourcePrefix; - const destinationBucket = reqBody.destinationBucket; - const destinationPrefix = reqBody.destinationPrefix; - try { + const reqBody = + typeof event.body === 'string' + ? JSON.parse(event.body) + : (event.body ?? {}); + const { sourceBucket, sourcePrefix, destinationBucket, destinationPrefix } = reqBody; + + if (!sourceBucket || !sourcePrefix || !destinationBucket || !destinationPrefix) { + return { + statusCode: 400, + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ + message: "sourceBucket, sourcePrefix, destinationBucket, and destinationPrefix are required" + }), + isBase64Encoded: false + }; + } + console.log(`Zipping and uploading files from ${sourcePrefix} to ${destinationPrefix}`);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@SRE/lambdaresources/index.mjs` around lines 7 - 13, Move JSON.parse(event.body) and required-field checks into the try block inside handler so malformed or missing bodies produce a controlled 400 response; implement full pagination when gathering object keys by repeatedly calling listObjectsV2 using IsTruncated and NextContinuationToken until all pages are fetched (update the logic that currently calls listObjectsV2 once to accumulate keys across pages); and convert the archiver error event handling so archive.on('error', ...) rejects a Promise that the handler awaits (e.g., wrap the archiver streaming/finish/error events in a Promise and reject on error) instead of throwing inside the event handler, so errors propagate into the outer try/catch.
🟡 Minor comments (6)
cft-templates/bedrock.yaml-66-72 (1)
66-72:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove duplicate
iam:CreateRoleaction.The action
iam:CreateRoleappears twice in the statement (lines 66 and 72). Remove the duplicate to clean up the policy.🧹 Proposed fix
- Effect: Allow Action: - - "iam:CreateRole" - "iam:AttachRolePolicy" - "iam:PutRolePolicy" - "lambda:ListFunctions" - "lambda:GetFunction" - "lambda:InvokeFunction" - "iam:CreateRole"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/bedrock.yaml` around lines 66 - 72, The policy statement contains a duplicated action "iam:CreateRole"; remove the second occurrence of "iam:CreateRole" from the actions array so each IAM action (e.g., "iam:CreateRole", "iam:AttachRolePolicy", "iam:PutRolePolicy", "lambda:ListFunctions", "lambda:GetFunction", "lambda:InvokeFunction") appears only once in the statement.dump/standardcatalogitems.json-1531-1534 (1)
1531-1534:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPoint the Nginx catalog item at its own docs.
The
DetailsLinkforNginx Web Serverstill targets the RStudio page, so users will be sent to the wrong documentation from the catalog.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dump/standardcatalogitems.json` around lines 1531 - 1534, The DetailsLink for the "Nginx Web Server" catalog item is incorrectly pointing to the RStudio docs; update the JSON entry with Key "DetailsLink" for the Nginx item so its Value points to the Nginx documentation URL (replace the current RStudio URL with the Nginx docs page, e.g. the project's nginx docs path) to ensure the catalog links to the correct documentation.config/settings-config.json-1068-1073 (1)
1068-1073:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix the IGV icon path.
Line 1072 uses
./../assets/images/Screen-icon.png, while the rest of the file uses../../assets/.... This one resolves differently and is likely to break the running-state icon for IGV only.Suggested fix
- "imageUrl": "./../assets/images/Screen-icon.png" + "imageUrl": "../../assets/images/Screen-icon.png"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@config/settings-config.json` around lines 1068 - 1073, The imageUrl under the "running" -> "connect" entry uses a wrong relative path ("./../assets/images/Screen-icon.png"); update the "imageUrl" value for that object to use the consistent "../../assets/images/Screen-icon.png" path so it matches the rest of the file and resolves correctly (look for the "running" object and its "connect" array containing the "menu": "Remote Desktop" entry and change its "imageUrl").upload-assets.sh-7-12 (1)
7-12:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMissing exit code on error path.
When a required utility is not found,
exitshould include an explicit non-zero code to signal failure to calling scripts or CI pipelines.Proposed fix
if ! command -v "$program" >/dev/null 2>&1; then echo "$program not found. This Script needs jq and aws cli. Please install the application/s and restart deployment, Exiting." - exit + exit 1 else🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@upload-assets.sh` around lines 7 - 12, The failure path in upload-assets.sh uses a bare exit which returns 0; update the error handling in the block that checks the variable program (the `if ! command -v "$program" ...` branch) to call exit with a non-zero code (e.g., `exit 1`) so calling scripts/CI receive a failure status; keep the existing error message (optionally redirect it to stderr) and leave the success branch (`echo "$program found"`) unchanged.scripts/bootstrap-scripts/bootstrap.sh-89-90 (1)
89-90:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse POSIX-compliant
||instead of-oin test expression.The
-ooperator inside[ ]is not well defined by POSIX and can behave unexpectedly with certain string values.Proposed fix
# Exit if no S3 mounts were specified -[ -z "$S3_MOUNTS" -o "$S3_MOUNTS" = "[]" ] && exit 0 +[ -z "$S3_MOUNTS" ] || [ "$S3_MOUNTS" = "[]" ] && exit 0🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/bootstrap-scripts/bootstrap.sh` around lines 89 - 90, Replace the non-POSIX test using "-o" with a POSIX-compliant compound test: evaluate each predicate separately and combine them with the shell OR operator. For the S3_MOUNTS check, change the line using [ -z "$S3_MOUNTS" -o "$S3_MOUNTS" = "[]" ] to a form like if [ -z "$S3_MOUNTS" ] || [ "$S3_MOUNTS" = "[]" ]; then exit 0; fi (or use ([ -z "$S3_MOUNTS" ] || [ "$S3_MOUNTS" = "[]" ]) && exit 0) so the S3_MOUNTS empty or equals "[]" condition is evaluated portably.cft-templates/sagemakerUserProfile.yaml-311-311 (1)
311-311:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpgrade Lambda runtime to a supported Python version (python3.9 → python3.11/3.12)
cft-templates/sagemakerUserProfile.yamlsetsRuntime: python3.9(line 311). AWS Lambda Python 3.9 reached end-of-support on Dec 15, 2025; AWS blocks new function creation starting Feb 1, 2027 and updates starting Mar 3, 2027. Upgrade topython3.11orpython3.12for continued security/support.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/sagemakerUserProfile.yaml` at line 311, The template currently sets the Lambda runtime to "python3.9" (Runtime: python3.9); update that Runtime property in the AWS::Lambda::Function resource to a supported value such as "python3.11" or "python3.12", then run a deployment/build to ensure any dependencies, layers, or packaging (e.g., compiled wheels) are compatible and adjust them if necessary.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8b61b836-60f5-4e51-af4f-fd4a292bdc62
⛔ Files ignored due to path filters (1)
SRE/lambdaresources/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (21)
SRE/Egress/Setup-templates/egressresources.ymlSRE/lambdaresources/index.mjsSRE/lambdaresources/package.jsoncft-templates/awsSagemakerDomainAI.yamlcft-templates/bedrock.yamlcft-templates/ec2-nginxWebServer.ymlcft-templates/ec2-rockyLinux.ymlcft-templates/pcs.yamlcft-templates/pcs_quick_lt.yamlcft-templates/sagemakerUserProfile.yamlcft-templates/windows-remotedesktop.ymlconfig/access-policy.jsonconfig/access-secure-policy.jsonconfig/config.jsonconfig/settings-config.jsondeploy.shdump/configs.jsondump/standardcatalogitems.jsonrg_main_stack.ymlscripts/bootstrap-scripts/bootstrap.shupload-assets.sh
| SecurityGroupIngress: | ||
| - IpProtocol: tcp | ||
| FromPort: 0 | ||
| ToPort: 65535 | ||
| CidrIp: 0.0.0.0/0 # Allow all incoming traffic (adjust as needed for security) | ||
| SecurityGroupEgress: | ||
| # Allow all outbound traffic so instances can reach AWS services and the internet | ||
| - IpProtocol: '-1' | ||
| CidrIp: 0.0.0.0/0 |
There was a problem hiding this comment.
Security group allows all TCP ports from anywhere – critical security risk.
The ClusterSecurityGroup allows inbound TCP on ports 0-65535 from 0.0.0.0/0. This exposes all services running on cluster nodes to the entire internet, creating a severe attack surface.
Proposed fix - Remove overly permissive ingress
Tags:
- Key: cost_resource
Value: !Sub '${AWS::StackName}'
- Key: Name
Value: !Sub 'cluster-${AWS::StackName}'
- SecurityGroupIngress:
- - IpProtocol: tcp
- FromPort: 0
- ToPort: 65535
- CidrIp: 0.0.0.0/0 # Allow all incoming traffic (adjust as needed for security)
SecurityGroupEgress:The self-referencing ingress rule (ClusterAllowAllInboundFromSelf) and the separate InboundSshSecurityGroup already provide the necessary intra-cluster communication and SSH access.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| SecurityGroupIngress: | |
| - IpProtocol: tcp | |
| FromPort: 0 | |
| ToPort: 65535 | |
| CidrIp: 0.0.0.0/0 # Allow all incoming traffic (adjust as needed for security) | |
| SecurityGroupEgress: | |
| # Allow all outbound traffic so instances can reach AWS services and the internet | |
| - IpProtocol: '-1' | |
| CidrIp: 0.0.0.0/0 | |
| SecurityGroupEgress: | |
| # Allow all outbound traffic so instances can reach AWS services and the internet | |
| - IpProtocol: '-1' | |
| CidrIp: 0.0.0.0/0 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cft-templates/pcs_quick_lt.yaml` around lines 209 - 217, The
SecurityGroupIngress rule on ClusterSecurityGroup currently permits tcp FromPort
0 ToPort 65535 from 0.0.0.0/0; remove or tighten this overly-permissive ingress
in the SecurityGroupIngress block for ClusterSecurityGroup (delete the
0–65535/0.0.0.0/0 rule or replace it with specific allowed ports such as 22 and
any required application ports), relying on the existing
ClusterAllowAllInboundFromSelf for intra-cluster traffic and
InboundSshSecurityGroup for SSH access; validate the template after change to
ensure no other resources depend on the open-all rule.
| - | | ||
| cat > /etc/amazon/pcs/bootstrap_config.json <<'EOF' | ||
| { "cluster": { "version": "Slurm_24.11", "disable_multithreading": true, "scheduler": "slurm", "base_os": "alinux2", "cluster_id": "${AWS::StackName}", "slurm": { "endpoint": "https://pcs.${AWS::Region}.api.aws" }}} | ||
| EOF |
There was a problem hiding this comment.
Bootstrap config hardcodes Slurm version, ignoring SlurmVersion parameter.
Same issue as in pcs.yaml: the bootstrap config embeds "version": "Slurm_24.11" while the SlurmVersion parameter allows other versions. This will cause cluster misconfiguration.
Proposed fix
- |
cat > /etc/amazon/pcs/bootstrap_config.json <<'EOF'
- { "cluster": { "version": "Slurm_24.11", "disable_multithreading": true, "scheduler": "slurm", "base_os": "alinux2", "cluster_id": "${AWS::StackName}", "slurm": { "endpoint": "https://pcs.${AWS::Region}.api.aws" }}}
+ { "cluster": { "version": "Slurm_${SlurmVersion}", "disable_multithreading": true, "scheduler": "slurm", "base_os": "alinux2", "cluster_id": "${AWS::StackName}", "slurm": { "endpoint": "https://pcs.${AWS::Region}.api.aws" }}}
EOFNote: Remove the <<'EOF' heredoc syntax when using CloudFormation !Sub interpolation, or use <<EOF (without quotes) to enable variable expansion.
Also applies to: 448-451
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cft-templates/pcs_quick_lt.yaml` around lines 376 - 379, The bootstrap JSON
hardcodes "version": "Slurm_24.11" which ignores the SlurmVersion parameter;
update the heredoc to allow CloudFormation substitution and plug in the
SlurmVersion parameter (e.g. replace the literal "Slurm_24.11" with
${SlurmVersion} via !Sub), and remove the single-quoted heredoc (use <<EOF or
inline !Sub) so variables expand; apply the same change for the second
occurrence that sets the bootstrap_config.json.
| exec > >(tee -a /var/log/amazon/pcs/bootstrap.log | logger -t user-data -s 2>/dev/ttyS0) 2>&1 | ||
|
|
||
| mkdir -p /etc/amazon/pcs | ||
| echo '{ "cluster": { "version": "Slurm_24.11", "disable_multithreading": true, "scheduler": "slurm", "base_os": "alinux2", "cluster_id": "${AWS::StackName}", "slurm": { "endpoint": "https://pcs.${AWS::Region}.api.aws" }}}' > /etc/amazon/pcs/bootstrap_config.json |
There was a problem hiding this comment.
Bootstrap config hardcodes Slurm version, ignoring SlurmVersion parameter.
The bootstrap config JSON embeds "version": "Slurm_24.11" but users can select 23.11 or 24.05 via the SlurmVersion parameter. This mismatch will cause bootstrap failures when non-default versions are selected.
Proposed fix for LoginLaunchTemplate UserData (line 427)
- echo '{ "cluster": { "version": "Slurm_24.11", "disable_multithreading": true, "scheduler": "slurm", "base_os": "alinux2", "cluster_id": "${AWS::StackName}", "slurm": { "endpoint": "https://pcs.${AWS::Region}.api.aws" }}}' > /etc/amazon/pcs/bootstrap_config.json
+ echo '{ "cluster": { "version": "Slurm_${SlurmVersion}", "disable_multithreading": true, "scheduler": "slurm", "base_os": "alinux2", "cluster_id": "${AWS::StackName}", "slurm": { "endpoint": "https://pcs.${AWS::Region}.api.aws" }}}' > /etc/amazon/pcs/bootstrap_config.jsonApply the same fix to both LoginLaunchTemplate (line 427) and ComputeLaunchTemplate (line 489).
Also applies to: 489-489
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cft-templates/pcs.yaml` at line 427, The bootstrap JSON in the
LoginLaunchTemplate and ComputeLaunchTemplate UserData hardcodes "version":
"Slurm_24.11" causing mismatches with the SlurmVersion parameter; update both
UserData snippets (LoginLaunchTemplate and ComputeLaunchTemplate) to inject the
selected SlurmVersion (e.g., using the CloudFormation parameter reference for
SlurmVersion/AWS::NoValue pattern you used elsewhere) so the JSON writes
"version": "Slurm_<SlurmVersion>" (or equivalent) into
/etc/amazon/pcs/bootstrap_config.json, preserving the surrounding JSON structure
and escaping so the template emits the correct string for any SlurmVersion
value.
| - kms:Decrypt | ||
| - kms:GetKeyPolicy | ||
| - kms:PutKeyPolicy | ||
| Resource: "*" |
There was a problem hiding this comment.
Scope KMS policy mutation more narrowly.
kms:PutKeyPolicy on "*" lets the portal role rewrite any customer-managed key policy in the account. That is a privilege-escalation path if the instance is ever compromised and is much broader than the S3 bucket-policy change described in this PR. Limit this to the specific keys the portal is expected to manage, or gate it with conditions on the allowed key set.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@rg_main_stack.yml` around lines 130 - 133, The IAM statement currently grants
kms:PutKeyPolicy (and other KMS actions) on Resource:"*", which is overly broad;
change the statement that mentions kms:PutKeyPolicy (and optionally kms:Decrypt,
kms:GetKeyPolicy) so it only targets the specific customer-managed KMS key ARNs
the portal must manage (replace Resource: "*" with an array of the exact
arn:aws:kms:...:key/<key-id> values used by the portal) or add an explicit
Condition that limits allowed keys (e.g., ArnLike/StringEquals on kms:KeyId or
kms:ResourceArn such that only the portal-managed key ARNs are permitted);
update the policy block that lists kms:PutKeyPolicy to use those key
ARNs/condition instead of the wildcard.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@config/config.json`:
- Around line 3-5: The Express app is bound to localhost which prevents other
containers from reaching advertised endpoints; update the express.host setting
in config.json (the "express" object, key "host") to bind to all interfaces
(e.g. "0.0.0.0") or an appropriate container-facing address so endpoints like
http://sp2_cc-3102:3002/... are reachable from sibling containers while keeping
express.port unchanged.
- Around line 184-188: sessionConfig currently contains a checked-in static
"secret" value; remove the hardcoded sessionConfig.secret and change runtime
config to read the secret from an external source (e.g., environment variable
SESSION_SECRET or a secrets manager) so deployments inject the real value;
update the config loader to fallback to process.env.SESSION_SECRET (or Secrets
Manager API) when sessionConfig.secret is absent, and adjust deployment tooling
that writes configs (scripts/fixconfigs.sh) and docker-compose runtime mounts to
pass the secret via environment/secret mechanism rather than embedding it in
config.json.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
feat(ec2-rockyLinux): add password parameter and update user data script for ec2-user
sessionConfig secret should be replaced at deployment time. Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
SRE/lambdaresources/index.mjs (2)
43-45:⚠️ Potential issue | 🟠 Major | ⚡ Quick winArchive error handler throws inside event callback—error won't be caught.
Throwing from an
'error'event listener doesn't propagate to the surroundingtry/catch. The error will surface as an unhandled rejection (or crash), bypassing the 500 response. Consider rejecting a tracked promise or setting an error flag.🛡️ Proposed fix using a deferred rejection
+ let archiveError = null; + archive.on('error', (error) => { - throw new Error(`Archiving error: ${error.message}`); + archiveError = new Error(`Archiving error: ${error.message}`); + archive.abort(); }); archive.pipe(passThroughStream); // Start S3 upload process asynchronously const uploadParams = { Bucket: destinationBucket, Key: destinationPrefix, Body: passThroughStream, ContentType: 'application/zip' }; const uploadPromise = s3.upload(uploadParams).promise(); // Add files to archive for (const obj of contents) { + if (archiveError) throw archiveError; const fileKey = obj.Key;Alternatively, wrap the entire archive/upload section in a
new Promisethat rejects on the'error'event.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@SRE/lambdaresources/index.mjs` around lines 43 - 45, The 'archive.on("error", (error) => { throw new Error(...) })' handler throws inside an event callback so it won't be caught by surrounding try/catch; change this to signal failure to the outer flow by rejecting a Promise or setting an error flag instead. Specifically, wrap the archive/upload flow in a new Promise and inside archive.on('error', ...) call reject(error) (or set a local error variable and ensure the outer Promise rejects after stream completion), and update any await logic to await that Promise so the error is propagated and results in the 500 response; locate the handler on the archive instance and the surrounding archive/upload logic to implement the Promise-based rejection.
34-37:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winCritical:
listedObjectsis undefined—function will crash at runtime.The pagination logic populates
contents, but line 34 referenceslistedObjects.Contents.lengthwhich was never defined. This causes aReferenceErroron every invocation, preventing both the 404 early-return and the archiving path from ever executing.🐛 Proposed fix
- if (listedObjects.Contents.length === 0) { + if (contents.length === 0) { console.log('No files found in the source folder'); return { statusCode: 404, body: 'No files found in the source folder' }; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@SRE/lambdaresources/index.mjs` around lines 34 - 37, The code checks listedObjects.Contents but the pagination routine populates the variable named contents, so replace the undefined reference by using the populated variable (e.g., check contents.length) or wrap contents into an object (e.g., listedObjects = { Contents: contents }) before the existing if; update the conditional that returns the 404 to reference the correct symbol (contents or the newly-created listedObjects) so the function no longer throws a ReferenceError and correctly handles the empty-case in the archive flow.upload-assets.sh (1)
110-112:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMissing
-rflag may exclude subdirectories from dump archive.
zip dump.zip dump/*without-rwill not recursively include subdirectories. Ifdump/contains nested directories, their contents will be silently omitted, potentially causing incomplete deployments.🐛 Proposed fix
-zip dump.zip dump/* +zip -r dump.zip dump/* unzip -l dump.zip🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@upload-assets.sh` around lines 110 - 112, The zip command currently uses "zip dump.zip dump/*" which omits nested directories; change it to use the recursive flag (e.g., "zip -r dump.zip dump") so the entire dump/ directory and its subdirectories are included when creating dump.zip, then keep the subsequent unzip -l and aws s3 cp steps unchanged; update the command that references dump.zip and dump/* in the script accordingly.cft-templates/windows-remotedesktop.yml (2)
24-31:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDefault CIDR
0.0.0.0/0exposes RDP and WinRM to the internet.The default value opens ports 3389 (RDP), 5986 (WinRM over HTTPS), and 8443 (DCV) to the entire internet. This is a significant security risk, especially for Windows instances which are frequent targets for brute-force attacks.
Consider requiring users to explicitly specify their IP range or using a more restrictive default.
🔒 Suggested fix: Remove default or use placeholder
AllowedSSHLocation: Description: The IP address range that can be used to SSH to the EC2 instances Type: String MinLength: "9" MaxLength: "18" - Default: 0.0.0.0/0 AllowedPattern: (\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2}) ConstraintDescription: must be a valid IP CIDR range of the form x.x.x.x/x.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/windows-remotedesktop.yml` around lines 24 - 31, The AllowedSSHLocation parameter currently defaults to 0.0.0.0/0 which exposes RDP/WinRM/DCV to the internet; remove the Default value (or replace it with a non-routable placeholder like "REPLACE_ME" or an empty string) and update the parameter validation/description so users must explicitly supply a CIDR (alter AllowedSSHLocation's Default and Description and keep AllowedPattern/ConstraintDescription as-is) to prevent an insecure open-default configuration.
78-95:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUserData lacks error handling; failures before
set_user_token.batwon't be signaled.PowerShell doesn't stop on errors by default. If
net userfails (e.g., password doesn't meet complexity requirements), the script continues and$lastexitcodecaptures only the exit code of the most recent native command (set_user_token.bat), potentially signaling success despite the password failure.🛠️ Suggested fix: Add error handling
Fn::Base64: !Sub | <powershell> # Redirect stdout and stderr to a log file and the console $scriptPath = "C:\Users\Administrator\log\user-data.log" - Start-Transcript -Path $scriptPath -Append + Start-Transcript -Path $scriptPath -Append + $ErrorActionPreference = "Stop" + $exitCode = 0 - net user Administrator ${Password} + try { + net user Administrator ${Password} + if ($LASTEXITCODE -ne 0) { throw "net user failed with exit code $LASTEXITCODE" } - echo "setting user token for nice-dcv" - C:\Users\Administrator\set_user_token.bat + echo "setting user token for nice-dcv" + C:\Users\Administrator\set_user_token.bat + if ($LASTEXITCODE -ne 0) { throw "set_user_token.bat failed with exit code $LASTEXITCODE" } + } catch { + Write-Error $_ + $exitCode = 1 + } echo "signal cft to success/failure" - cfn-signal.exe -e $lastexitcode --stack ${AWS::StackId} --resource EC2Instance --region ${AWS::Region} + cfn-signal.exe -e $exitCode --stack ${AWS::StackId} --resource EC2Instance --region ${AWS::Region} # End the transcript (stop logging) Stop-Transcript </powershell>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/windows-remotedesktop.yml` around lines 78 - 95, Make the UserData PowerShell script fail fast and always send the correct exit code to cfn-signal: set PowerShell to stop on errors (e.g., set $ErrorActionPreference = 'Stop'), wrap critical native calls (net user Administrator and C:\Users\Administrator\set_user_token.bat) in a try/catch, and in the catch call cfn-signal.exe with a non‑zero error code and then exit; after running the native set_user_token.bat, capture the native process result using $LASTEXITCODE and pass that value to cfn-signal.exe (instead of relying on a last-invoked cmd variable), then Stop-Transcript. Reference symbols: UserData block, Start-Transcript/Stop-Transcript, net user Administrator, C:\Users\Administrator\set_user_token.bat, cfn-signal.exe, and $LASTEXITCODE.
🧹 Nitpick comments (2)
upload-assets.sh (1)
51-57: ⚡ Quick winRedundant wait and dead-code error check.
aws cloudformation deployalready waits for the stack operation to complete by default. The subsequentwait stack-create-completeis redundant and will fail if this is an update (not create). Additionally, withset -eenabled, theif [ $? -ne 0 ]check is dead code—ifwaitfails, the script exits before reaching the check.♻️ Proposed fix
aws cloudformation deploy --template-file rg_deploy_bucket.yml --stack-name "$bucketstackname" \ - --parameter-overrides S3NewBucketName="$bucketname" - aws cloudformation wait stack-create-complete --stack-name "$bucketstackname" - if [ $? -ne 0 ]; then - echo "Error: Failed to create S3 bucket via CloudFormation." - exit 1 - fi + --parameter-overrides S3NewBucketName="$bucketname" \ + || { echo "Error: Failed to create S3 bucket via CloudFormation."; exit 1; } fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@upload-assets.sh` around lines 51 - 57, Remove the redundant explicit waiter and dead-code error check: delete the aws cloudformation wait stack-create-complete --stack-name "$bucketstackname" line and the following if [ $? -ne 0 ]; then ... fi block; rely on aws cloudformation deploy (and the script's set -e) to surface failures instead. This touches the aws cloudformation deploy invocation and the wait/exit check surrounding "$bucketstackname" and "$bucketname".cft-templates/windows-remotedesktop.yml (1)
111-131: 💤 Low valueSecurity group description is misleading; port 80 purpose is unclear.
The
GroupDescriptionsays "Enable SSH access" but this is a Windows instance that uses RDP, not SSH. Additionally, port 80 (HTTP) is opened but DCV typically only needs port 8443 (HTTPS). If port 80 is unnecessary, consider removing it to reduce the attack surface.📝 Suggested fix
InstanceSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: - GroupDescription: Enable SSH access + GroupDescription: Enable RDP and DCV access for Windows Remote Desktop SecurityGroupIngress: - - IpProtocol: tcp - FromPort: "80" - ToPort: "80" - CidrIp: !Ref "AllowedSSHLocation" - IpProtocol: tcp FromPort: "3389"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cft-templates/windows-remotedesktop.yml` around lines 111 - 131, The security group InstanceSecurityGroup has a misleading GroupDescription ("Enable SSH access") and an unnecessary ingress rule for port 80; update GroupDescription to reflect RDP/management access (e.g., "Enable RDP and management ports") and remove the SecurityGroupIngress entry that uses FromPort: "80" ToPort: "80"; keep the RDP (3389), WinRM/management (5986) and DCV (8443) rules as needed and ensure the ingress CIDR reference (AllowedSSHLocation) is still appropriate or renamed if necessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@cft-templates/windows-remotedesktop.yml`:
- Around line 24-31: The AllowedSSHLocation parameter currently defaults to
0.0.0.0/0 which exposes RDP/WinRM/DCV to the internet; remove the Default value
(or replace it with a non-routable placeholder like "REPLACE_ME" or an empty
string) and update the parameter validation/description so users must explicitly
supply a CIDR (alter AllowedSSHLocation's Default and Description and keep
AllowedPattern/ConstraintDescription as-is) to prevent an insecure open-default
configuration.
- Around line 78-95: Make the UserData PowerShell script fail fast and always
send the correct exit code to cfn-signal: set PowerShell to stop on errors
(e.g., set $ErrorActionPreference = 'Stop'), wrap critical native calls (net
user Administrator and C:\Users\Administrator\set_user_token.bat) in a
try/catch, and in the catch call cfn-signal.exe with a non‑zero error code and
then exit; after running the native set_user_token.bat, capture the native
process result using $LASTEXITCODE and pass that value to cfn-signal.exe
(instead of relying on a last-invoked cmd variable), then Stop-Transcript.
Reference symbols: UserData block, Start-Transcript/Stop-Transcript, net user
Administrator, C:\Users\Administrator\set_user_token.bat, cfn-signal.exe, and
$LASTEXITCODE.
In `@SRE/lambdaresources/index.mjs`:
- Around line 43-45: The 'archive.on("error", (error) => { throw new Error(...)
})' handler throws inside an event callback so it won't be caught by surrounding
try/catch; change this to signal failure to the outer flow by rejecting a
Promise or setting an error flag instead. Specifically, wrap the archive/upload
flow in a new Promise and inside archive.on('error', ...) call reject(error) (or
set a local error variable and ensure the outer Promise rejects after stream
completion), and update any await logic to await that Promise so the error is
propagated and results in the 500 response; locate the handler on the archive
instance and the surrounding archive/upload logic to implement the Promise-based
rejection.
- Around line 34-37: The code checks listedObjects.Contents but the pagination
routine populates the variable named contents, so replace the undefined
reference by using the populated variable (e.g., check contents.length) or wrap
contents into an object (e.g., listedObjects = { Contents: contents }) before
the existing if; update the conditional that returns the 404 to reference the
correct symbol (contents or the newly-created listedObjects) so the function no
longer throws a ReferenceError and correctly handles the empty-case in the
archive flow.
In `@upload-assets.sh`:
- Around line 110-112: The zip command currently uses "zip dump.zip dump/*"
which omits nested directories; change it to use the recursive flag (e.g., "zip
-r dump.zip dump") so the entire dump/ directory and its subdirectories are
included when creating dump.zip, then keep the subsequent unzip -l and aws s3 cp
steps unchanged; update the command that references dump.zip and dump/* in the
script accordingly.
---
Nitpick comments:
In `@cft-templates/windows-remotedesktop.yml`:
- Around line 111-131: The security group InstanceSecurityGroup has a misleading
GroupDescription ("Enable SSH access") and an unnecessary ingress rule for port
80; update GroupDescription to reflect RDP/management access (e.g., "Enable RDP
and management ports") and remove the SecurityGroupIngress entry that uses
FromPort: "80" ToPort: "80"; keep the RDP (3389), WinRM/management (5986) and
DCV (8443) rules as needed and ensure the ingress CIDR reference
(AllowedSSHLocation) is still appropriate or renamed if necessary.
In `@upload-assets.sh`:
- Around line 51-57: Remove the redundant explicit waiter and dead-code error
check: delete the aws cloudformation wait stack-create-complete --stack-name
"$bucketstackname" line and the following if [ $? -ne 0 ]; then ... fi block;
rely on aws cloudformation deploy (and the script's set -e) to surface failures
instead. This touches the aws cloudformation deploy invocation and the wait/exit
check surrounding "$bucketstackname" and "$bucketname".
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c3fd259a-21c3-430f-bc5a-ebb79171b23c
📒 Files selected for processing (11)
SRE/lambdaresources/index.mjsSRE/lambdaresources/package.jsoncft-templates/ec2-nginxWebServer.ymlcft-templates/ec2-rockyLinux.ymlcft-templates/windows-remotedesktop.ymlconfig/config.jsonconfig/settings-config.jsondeploy.shdump/configs.jsonscripts/bootstrap-scripts/bootstrap.shupload-assets.sh
🚧 Files skipped from review as they are similar to previous changes (8)
- deploy.sh
- SRE/lambdaresources/package.json
- scripts/bootstrap-scripts/bootstrap.sh
- dump/configs.json
- config/config.json
- cft-templates/ec2-nginxWebServer.yml
- config/settings-config.json
- cft-templates/ec2-rockyLinux.yml
The image should be taken from the region in which the lambda is created. Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
This change is
Summary by CodeRabbit
New Features
Chores