Skip to content

Commit 5885893

Browse files
committed
ansible: Improve Nomad reliability
This commit improves the reliability of Nomad on machines that have just restarted. It incorporates a suggestion from @ericonr to change the unbound config to permit consul to come up independently of unbound, and a change to the Nomad service to ensure that all of Nomad's dependencies are available when starting. One major issue remains, that is tracked as hashicorp/nomad#11272 wherein its possible for system tasks that have static port allocations to fail to reschedule if there's dangling state on a node after a reboot. This is likely going to require a patch in Nomad itself to resolve.
1 parent 3706430 commit 5885893

2 files changed

Lines changed: 12 additions & 0 deletions

File tree

ansible/roles/nomad/files/run

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,16 @@
22

33
modprobe bridge
44

5+
# We need consul and unbound first
6+
sv check consul >/dev/null || exit 1
7+
sv check unbound >/dev/null || exit 1
8+
9+
# We're racing DNS setup here because this *has* to be up before Nomad
10+
# starts in order for it to get fingerprinted.
11+
while ! getent hosts active.vault.service.consul ; do
12+
sleep 5
13+
unbound-control flush_negative
14+
done
15+
516
exec 2>&1
617
exec nomad agent -config /etc/nomad/

ansible/roles/unbound/templates/unbound.conf.j2

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ server:
1616

1717
do-not-query-localhost: no
1818
extended-statistics: yes
19+
infra-keep-probing: yes
1920

2021
{% if unbound_stub_consul|default(false) %}
2122
private-domain: consul.

0 commit comments

Comments
 (0)