This page compiles a list of common troubleshooting steps found during development and administration of executors.
To debug problems you might face with an executor instance, you can apply the following steps.
First, prepare the instance:
sshinto the host VM (see Connecting to cloud provider executor instances)sudo suto become therootusersystemctl stop executorto stop theexecutorserviceexport $(cat /etc/systemd/system/executor.env | xargs)to load the executor environment into your shell
You can now run executor validate, which will inform you about any configuration issues. Fix any reported issues before proceeding.
The next step is to create a temporary Firecracker VM for debugging purposes.
NOTE: if the host VM is provisioned with the Sourcegraph terraform modules, the VMs may be configured to stop automatically. Refer to Disabling the auto-deletion of Executor VMs for information to prevent this.
Run one of the following commands executor test-vm to generate a test firecracker VM:
# Test if a firecracker VM can be started
executor test-vm
# Test if a firecracker VM can be started and if a repository can be cloned into the VM's workspace
executor test-vm [--repo=github.com/sourcegraph/sourcegraph --revision=main]The command will output a line like:
Success! Connect to the VM using
$ ignite attach executor-test-vm-0160f53f-e765-4481-a81e-aa3c704d07bd
Execute the generated ignite attach <vm> command to gain a shell to the Firecracker VM.
NOTE: These instructions are for users using the VMs deployed via the Terraform Modules
The Executor host VMs are configured to automatically tear themselves down once all jobs in the queue are completed. While this is desired behaviour under regular circumstances, it complicates debugging issues in the executor configuration or connections. To prevent the VMs from automatically stopping:
sshinto the VMsudo suto become therootuser- Remove (or rename) the
/shutdown_executor.shfile
The VM should now persist after all jobs are satisfied.
If a server-side batch change fails unexpectedly, it's possible to recreate the generated Firecracker VM from the batch change execution.
NOTE: if the host VM is provisioned with the Sourcegraph terraform modules, the VMs may be configured to stop automatically. Refer to Disabling the auto-deletion of Executor VMs for information to prevent this.
-
Navigate to the failed execution page of the Batch Change
-
Select a failed Workspace on the left and click the
Diagnosticslink on the right pane -
In the modal, expand the
Setupstep by clicking the text or the expansion arrow on the right -
Copy the command from the final step of
Setupstarting withignite run -
sshinto the host VM -
sudo suto become therootuser -
systemctl stop executorto stop theexecutorservice -
export $(cat /etc/systemd/system/executor.env | xargs)to load the executor environment into your shell -
Paste in the command copied from the batch change. You may need to remove the
--copy-filesand--volumesdirectives as those volumes and files may not exist on the VM any longer. Surround the--kernel-argsarguments in quotes as well -
Execute the command and wait for the VM to start
-
Run
ignite psto list all currently running VMs -
Run
ignite attach <vm id>to get a shell to the running VM
An ARM64 (x86_64) linux distro must be used due to the machine type of the VM. You may list available ARM64 distros with the following command, depending on your cloud provider:
gcloud compute images list --filter='(family~amd)'aws ec2 describe-instances --filters architecture=x86_64The log level of executors are set using the environment variable SRC_LOG_LEVEL. The following values are allowed:
dbuginfowarn(default)errorcrit
Update or set this value in the shell profile or environment file of the instance, then run executor run to restart the instance.
Verify that the Docker mirror instance is functioning properly by testing the following:
Run the following command on the executor instance to determine whether it responds properly:
# If EXECUTOR_DOCKER_REGISTRY_MIRROR_URL is set to a custom URL, replace the base endpoint with its value
curl http://localhost:5000/v2/_catalogVerify that the registry is mounted under the expected path in the file system by running:
# This directory should always be mounted
ls /mnt/registry
# If jobs have been processed, the following path should exist
ls /mnt/registry/docker/registry/v2/repositories/<public repository name>The following commands allow you to SSH into an executor instance, depending on your cloud platform of choice.
Find the name of an executor instance with
# optionally provide the --project flag
gcloud compute instances list --filter="name~executor" --format="get(name)"Then, using the name of an instance, run
# optionally provide the --project flag
# use an identity-aware proxy tunnel with --tunnel-through-iap
gcloud compute ssh ${INSTANCE_NAME}Alternatively, you may navigate to the compute instance in the GCP web console, where you will be able to connect with SSH in-browser.
In order to connect to an EC2 instance using SSH, you must have specified a key pair when the instance was launched. If you have not done so, you can connect to your instance through the web console instead.
Assuming you have specified the key pair, first run
chmod 400 path/to/key.pem Find the public DNS value of your instance either through the web console or by using aws ec2 describe-instances, then run
ssh -i "path/to/key.pem" root@${INSTANCE_PUBLIC_DNS}This section lists some common mistakes with environment variables. Some of these will be exposed by running executor validate on the executor instance.
| Env var | Common mistakes |
|---|---|
EXECUTOR_FRONTEND_URL |
No protocol included (e.g. https:// |
EXECUTOR_FRONTEND_PASSWORD |
Not set in executor.accessToken in the site config |
EXECUTOR_QUEUE_NAME |
Value doesn't match one of [codeintel, batches] |
|
Value format can't be parsed by time.ParseDuration |
|
Value format not recognized by virtual machine or Docker |
EXECUTOR_FIRECRACKER_DISK_SPACE |
Value format not recognized by virtual machine |
EXECUTOR_DOCKER_REGISTRY_MIRROR_URL |
Wrong IP or port specified |
EXECUTOR_DOCKER_HOST_MOUNT_PATH |
Workspace does not exist at provided mount path |
EXECUTOR_VM_STARTUP_SCRIPT_PATH |
Script does not exist at provided file path |
|
Image does not exist for provided repository, name, or tag |
|
/metrics path is included or wrong IP or port specified |
SRC_LOG_LEVEL |
not set to one of [dbug, info, warn, error, crit] |
The VM instance must support KVM. In effect, this means the instance must meet certain requirements depending on the Cloud provider in use.
Nested virtualization must be enabled on the machine.
- SSH into the executor instance (see Connecting to cloud provider executor instances)
- Run the following command. If it outputs anything other than
0, nested virtualization is enabled:grep -cw vmx /proc/cpuinfo
Verify that the machine type in use is of type .metal (e.g. M5.metal).
iptables provides network isolation, security, and regulated access for Firecracker VMs. It implements NAT of private IP addresses for each VM, and allows forwarding only specific ports to VMs. It also blocks all other traffic, and prevents IP spoofing.
| Description | Purpose | Relevant rules |
|---|---|---|
| DNS traffic | DNS resolution | iptables -A CNI-ADMIN -p udp --dport 53 -j ACCEPT |
| Host to guest, established connections guest to host | SSH access | iptables -A INPUT -d 10.61.0.0/16 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT |
| From guest to gateway | Outbound internet access |
|
| Description | Purpose | Relevant rules |
|---|---|---|
| Guest to host | Block outbound traffic (e.g. other executors or the Docker registry) | iptables -A INPUT -s 10.61.0.0/16 -j DROP |
| Guest to guest | Block outbound traffic to other Firecracker VMs | iptables -A INPUT -s 10.61.0.0/16 -d 10.61.0.0/16 -j DROP |
| Guest to link-local | Block Cloud provider resources such as instance metadata | iptables -A INPUT -s 10.61.0.0/16 -d 169.254.0.0/16 -j DROP |