-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathruntime.worker.env.template
More file actions
306 lines (293 loc) · 16.9 KB
/
runtime.worker.env.template
File metadata and controls
306 lines (293 loc) · 16.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
# Remote agent runtime template
# API-backed agents do not need direct Dragonfly / Redis credentials.
# Control channel
CONTROL_URL=http://nbhhzmw5m2fwpss44aktrgxjzwxnw5fssfzl76fg6edfzf4c6sy4ihad.onion
AGENT_MANAGEMENT_URL=http://nbhhzmw5m2fwpss44aktrgxjzwxnw5fssfzl76fg6edfzf4c6sy4ihad.onion
CONTROL_PROXY_URL=socks5h://127.0.0.1:9050
# Agent identity and enrollment
AGENT_ID=node-1
AGENT_NAME=node-1
AGENT_STATE_FILE=/var/lib/agentd/agent.env
AGENT_BOOTSTRAP_CODE=replace-with-issued-bootstrap-code
AGENT_POOL=
AGENT_TAGS=
# Agent behavior
AGENT_ENABLE_BOOTSTRAP=false
# Installer auto-tunes this to the host CPU count when unset:
# AGENT_MAX_ACTIVE_TASKS=
POLL_INTERVAL_SECONDS=15
# Installer auto-tunes these to the host CPU count when unset:
# AGENT_CONCURRENCY=
# SCANNER_DEFAULT_RATE is the rate the scanner adapter falls back to when
# the per-scan invocation omits `rate_limit` entirely; an explicit
# `rate_limit=0` from the control plane is passed through verbatim and
# does NOT consult this value. The worker also reports it as registration
# metadata so the control plane can decide what rate to ask for.
#
# With ANYSCAN_DYNAMIC_RATE_ENABLED=true (the installer default) the
# adapter treats this as the AIMD seed rate when no calibration has been
# learned yet, ramping each worker toward its natural pps ceiling and
# persisting the learned value to /var/lib/agentd/rate-calibration.json
# so subsequent scans skip relearn. The installer writes 500000
# (~256 Mbit/s of 64-byte SYNs), well under the 1 Gbit ceiling that
# reserve-control-bandwidth.sh enforces on the bulk class — the tc
# reservation is the primary safety net for control-plane heartbeats, not
# this value. Bench on c6in.xlarge (4 vCPU) shows the bundled scanner
# sustains ~1.7M pps with sender_threads=4 receivers=1 against
# unreachable space; AIMD converges there from 500k in 3-4 windows.
# Raise SCANNER_DEFAULT_RATE if you have measured headroom. Set to 0 to
# disable the fallback (--rate flag omitted).
# SCANNER_DEFAULT_RATE=500000
# SCANNER_SENDER_THREADS=
# Receiver count above 1 does not raise capture throughput because the
# AF_PACKET queue serializes per-receiver. Default is 1.
# SCANNER_RECEIVER_THREADS=1
# Installer auto-detects the default route interface and prefers /usr/bin/scanner
# when present unless you override these:
# SCANNER_INTERFACE=
# SCANNER_BIN=/opt/agentd/bin/scanner
# Multi-NIC sharded scanning. AWS instance types like c6in.metal expose up
# to 8 ENAs and the bundled scanner caps around 3M pps per AF_PACKET socket
# due to per-socket TX-lock contention; running one scanner process per
# ENA breaks that ceiling and lets the host approach the ENA spec
# aggregate (~14M pps for c6in.metal, anygpt-24 bench data). When this
# list contains 2+ entries the adapter splits the per-scan target_range
# into N sub-ranges and spawns one child scanner per ENA, each running
# its own AIMD loop with per-NIC calibration (rate-calibration.json keys
# are per-iface). The installer auto-discovers usable ENAs (UP,
# non-loopback, IPv4-addressed) and writes the list when more than one
# is attached; set to a single iface or leave empty to stay on the
# legacy single-NIC code path. Set ANYSCAN_DISABLE_MULTI_NIC_AUTO=true
# in the installer environment to suppress auto-discovery.
# ANYSCAN_SCANNER_INTERFACES=eth0,eth1,eth2,eth3,eth4,eth5,eth6,eth7
# Cap on simultaneously active shard subprocesses spawned by the multi-NIC
# parent. anygpt-4 c6in.metal data: 4-NIC sustained 12.8M pps aggregate,
# 8-NIC regressed to 1.3M because shards 5-8 CPU-starved the others into
# AIMD-cratering on every window. The cap truncates ANYSCAN_SCANNER_INTERFACES
# to its first N entries so the adapter only fans out to N NICs even when
# more are attached; the remaining NICs sit idle for the scan but the
# kernel TX sweet spot is preserved. Set 0 or negative to disable the cap
# (legacy unbounded fan-out, NOT recommended on 8+ NIC hosts).
# ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4
# Dynamic AIMD port-scan rate adjustment. The adapter respawns the bundled
# scanner per window via its existing checkpoint+resume model, measures
# achieved pps from /sys/class/net/$IFACE/statistics/tx_packets and
# heartbeat slip via Python scheduler-jitter, then bumps additively on a
# clean window or halves on slip. Each adjustment emits a structured
# `[anyscan-rate-controller] metric=...` line on stderr for journal +
# scraper consumption.
# ANYSCAN_DYNAMIC_RATE_ENABLED=true
# ANYSCAN_RATE_WINDOW_SECONDS=30
# ANYSCAN_RATE_ADDITIVE_STEP=200000 # +pps per clean window
# ANYSCAN_RATE_MULTIPLICATIVE_FACTOR=0.5 # *factor on slip
# ANYSCAN_RATE_FLOOR=100000 # never drop below this
# ANYSCAN_RATE_CEILING=4000000 # rate-limit overhead hurts above this
# ANYSCAN_RATE_ACHIEVED_RATIO_FLOOR=0.9 # achieved/set must clear this for clean
# ANYSCAN_HEARTBEAT_LATENCY_THRESHOLD_MS=5000
# ANYSCAN_RATE_CALIBRATION_PATH=/var/lib/agentd/rate-calibration.json
# CPU-vs-network slip distinction. Heartbeat slip is ambiguous between
# CPU starvation (host pegged, scanner thread didn't get scheduled) and
# kernel TX overrun (NIC saturated). When loadavg/vcpu exceeds the
# threshold AND heartbeat slips, we attribute the slip to CPU and
# *hold* the rate instead of halving it — shrinking the rate doesn't
# free CPU and just wastes the headroom we already learned. The cap on
# concurrent shards (above) is the response that actually frees CPU.
# Both knobs default to values measured to keep us in the right
# classification at c6in.metal-class load profiles.
# ANYSCAN_CPU_LOAD_THRESHOLD=0.8 # loadavg/vcpu beyond which "CPU pressure"
# ANYSCAN_DROP_RATIO_THRESHOLD=0.001 # drop_ratio above this picks network as dominant
# Per-instance starting-rate / floor / ceiling table. Detected from
# /sys/devices/virtual/dmi/id/product_name first; on EC2 VMs that
# usually returns "Amazon EC2" so the adapter falls back to IMDSv2
# (http://169.254.169.254/latest/meta-data/instance-type) with a
# 1s timeout. Set ANYSCAN_INSTANCE_TYPE to skip detection — the
# multi-NIC parent does this automatically so children inherit the
# resolved type without redoing IMDS. Operator overrides (any
# ANYSCAN_RATE_* knob set explicitly) always win over the table.
# ANYSCAN_INSTANCE_TYPE=c6in.metal
# Egress bandwidth reservation (ExecStartPre runs reserve-control-bandwidth.sh).
# Reserves a guaranteed slice of the NIC for control-plane traffic so a busy
# zmap cannot drop the worker off /api/workers. Fails open: any tc/iptables
# error is logged and ignored.
# ANYSCAN_RESERVE_DISABLE=true # opt out entirely
# ANYSCAN_RESERVE_INTERFACE=eth0 # installer auto-detects from default
# # route. For multi-NIC sharded
# # workers set a comma-separated list
# # so the qdisc reservation is
# # installed on every NIC the
# # scanner drives, not just the
# # default-route one. Alias:
# # ANYSCAN_RESERVE_INTERFACES.
# ANYSCAN_RESERVE_BANDWIDTH_BPS=5000000 # 5 Mbit/s reserved for control plane
# ANYSCAN_RESERVE_LINK_RATE_BPS=1000000000 # 1 Gbit/s link ceiling
# ANYSCAN_CONTROL_PLANE_HOST= # comma-separated; defaults parse CONTROL_URL
# ANYSCAN_RESERVE_TOR_CGROUP_PATH=system.slice/agentd-tunnel.service
# ANYSCAN_RESERVE_TOR_PORTS=9001,9030,9101
# Scanner I/O engine. Phase 2 PR D of plans/2026-04-27-portscan-afxdp-plan-v1.md
# §3.7 added af_xdp; plans/2026-04-28-portscan-dpdk-impl-v1.md §3.10 adds dpdk.
# The bundled scanner exposes --io-engine={af_packet,af_xdp,pfring_zc,dpdk};
# the adapter reads this knob and forwards it as the flag value. AF_PACKET
# is the unconditional default and the unconditional fallback. AF_XDP is
# opt-in per worker and is only honored when:
# 1. ANYSCAN_AF_XDP_AVAILABLE=true, which install-worker-bundle.sh writes
# after probing kernel >=5.10 + libxdp.so loadable; AND
# 2. The systemd unit grants CAP_BPF (anyscan-worker.service /
# anyscan-worker-only.service already include it so the operator
# does not have to refresh the unit when flipping this knob).
# pfring_zc is similarly opt-in and only honored when
# ANYSCAN_PFRING_ZC_AVAILABLE=true (install-time probe: pfring kernel
# module loaded + libpfring.so loadable + commercial license file
# present).
# When ANYSCAN_SCANNER_IO_ENGINE=af_xdp/pfring_zc is requested but the
# corresponding _AVAILABLE knob is false (or absent), the adapter logs a
# warning to stderr and falls back to af_packet so the scanner does not
# crash at startup with a dlopen / cluster-init error. Set to "af_packet"
# or leave unset to stay on the known-safe default.
# ANYSCAN_SCANNER_IO_ENGINE=af_packet
#
# Build-time AF_XDP opt-in (read by install-external-deps.sh,
# package-worker-bundle.sh, deploy.sh — NOT by the agentd runtime). The
# runtime knobs above only matter when the scanner C source was actually
# compiled with `make USE_AF_XDP=1`; without that, --io-engine=af_xdp has
# no AF_XDP code linked and the scanner falls back at startup. The
# anygpt-42 wire-up gap was that none of the build/bundle/deploy scripts
# forwarded this flag. Set ANYSCAN_USE_AF_XDP=1 in the operator's shell
# (or systemd EnvironmentFile) before running the build/bundle pipeline
# to produce a libxdp-linked scanner; the build path force-rebuilds when
# it detects a cached AF_PACKET-only binary. Keep at 0 to stay on the
# AF_PACKET-only build that has shipped historically. See
# plans/2026-04-27-portscan-afxdp-plan-v1.md §3.6.
# ANYSCAN_USE_AF_XDP=0
#
# Build-time PF_RING ZC opt-in (anygpt-46). Same shape and audience as
# ANYSCAN_USE_AF_XDP — read by install-external-deps.sh,
# package-worker-bundle.sh, deploy.sh, NOT by the agentd runtime. Setting
# 1 forwards `USE_PFRING_ZC=1` to the engine make, which links
# `-lpfring -lpcap` and pulls in src/{send,recv}-pfring.c so the
# io_engine_pfring_zc vtable in engine.c is reachable from
# pick_io_engine. Without this flag the ZC source files compile to
# nothing under their #ifdef and --io-engine=pfring_zc errors at startup.
#
# LICENSE NOTE — PF_RING ZC is a commercial library from ntop. At runtime
# libpfring requires a per-host commercial license to operate at full
# kernel-bypass speed; without the license the runtime falls back to a
# community/demo mode that throttles ZC traffic to ~100k pps, which is
# *below* the throughput AF_PACKET already sustains. Flipping
# ANYSCAN_USE_PFRING_ZC=1 without provisioning a license therefore
# regresses scan throughput. License procurement is an operator
# responsibility, tracked outside this codebase.
#
# RUNTIME PREREQUISITES on the host running the bundled scanner:
# - pfring kernel module loaded (`modprobe pfring`); install via
# ntop's apt-stable repo (https://packages.ntop.org/apt-stable/) or
# dkms package pfring-dkms.
# - libpfring.so on the dynamic linker path (libpfring-dev provides
# the development headers + .so symlink).
# - PF_RING ZC license file at the location libpfring expects
# (typically /etc/pf_ring/license).
# install-worker-bundle.sh's runtime probe checks the kernel module +
# libpfring.so visibility and writes ANYSCAN_PFRING_ZC_AVAILABLE so the
# adapter can gate downgrades without per-scan overhead.
# ANYSCAN_USE_PFRING_ZC=0
#
# Runtime PF_RING ZC availability flag (written by install-worker-bundle.sh
# at install time; safe for operators to override only when they know the
# probe is wrong, e.g. libpfring is on a path not visible to ldconfig).
# Defaults to false; when ANYSCAN_SCANNER_IO_ENGINE=pfring_zc the adapter
# refuses to forward the flag unless this is true.
# ANYSCAN_PFRING_ZC_AVAILABLE=false
# DPDK userspace-networking I/O engine. Phase 2 of
# plans/2026-04-28-portscan-dpdk-impl-v1.md. DPDK is opt-in per worker and
# is only honored when:
# 1. ANYSCAN_DPDK_AVAILABLE=true, which install-worker-bundle.sh writes
# after probing librte_eal.so loadable + scanner USE_DPDK-built +
# vfio_pci kernel module loaded + hugepages reserved at
# /sys/kernel/mm/hugepages/* + /dev/vfio/vfio present; AND
# 2. tools/setup-dpdk.sh has been run successfully on the host, binding
# the listed ENIs to vfio-pci and reserving hugepages.
# 3. The systemd unit grants CAP_SYS_RAWIO + CAP_IPC_LOCK + CAP_NET_ADMIN
# (Phase 2 systemd-unit edit; until that lands operators must add
# these caps manually before flipping the runtime knob).
# When ANYSCAN_SCANNER_IO_ENGINE=dpdk is requested but ANYSCAN_DPDK_AVAILABLE
# is false, the adapter logs a warning to stderr and falls back to
# af_packet so the scanner does not crash at startup with a dlopen / EAL
# init error.
#
# Build-time DPDK opt-in (read by install-external-deps.sh,
# package-worker-bundle.sh, deploy.sh — NOT by the agentd runtime). The
# runtime knobs above only matter when the scanner C source was actually
# compiled with `make USE_DPDK=1`; without that, --io-engine=dpdk has no
# DPDK code linked and the scanner falls back at parse time. Keep at 0
# to stay on the AF_PACKET-only build that has shipped historically.
# ANYSCAN_USE_DPDK=0
#
# Runtime DPDK availability flag (written by install-worker-bundle.sh at
# install time; safe for operators to override only when they know the
# probe is wrong, e.g. when /dev/vfio/vfio appears after the install
# probe ran). Defaults to false; when ANYSCAN_SCANNER_IO_ENGINE=dpdk the
# adapter refuses to forward the flag unless this is true.
# ANYSCAN_DPDK_AVAILABLE=false
#
# DPDK NIC binding. Comma-separated PCI BDFs (e.g. "0000:00:06.0,0000:00:07.0")
# OR comma-separated kernel iface names (e.g. "eth1,eth2"); tools/setup-dpdk.sh
# resolves iface names to BDFs at bind time. tools/setup-dpdk.sh refuses to
# bind eth0 (the agentd control-plane interface) and refuses to bind the
# only NIC in a single-NIC instance, so eth0 always retains kernel
# networking. On c6in.metal with eth0..eth7, set this to eth1..eth7 (or
# their BDFs) — eth0 stays kernel-bound for agentd heartbeat.
# ANYSCAN_DPDK_PCI_BDFS=
#
# DPDK hugepages reservation in GiB. Default 4. c6in.metal has 192 GiB so
# 4 GiB is a rounding error; smaller instance shapes may want less. The
# install-time probe asserts at least 1 GiB worth of hugepages are
# reserved before marking DPDK available, but the actual reservation is
# done by tools/setup-dpdk.sh on the host (sysctl vm.nr_hugepages writes
# require root, so this script keeps that as an explicit operator step).
# ANYSCAN_DPDK_HUGEPAGES_GB=4
# Opt-in kernel backport upgrade (read by install-external-deps.sh,
# package-worker-bundle.sh, deploy.sh — NOT by the agentd runtime).
# Set 1 to install the Debian bookworm-backports kernel image
# (linux-image-cloud-amd64 by default) on a Debian-family host whose
# stock kernel is older than 6.16. Kernel 6.16+ carries the in-flight
# `ena_xdp_zc` ENA driver patches that AF_XDP zerocopy on ENA needs;
# without them the scanner's AF_XDP path falls back to drv+copy and
# caps c6in.metal 8-NIC cap=4 throughput at ~22M pps (anygpt-42 live
# bench, vs the 30-50M projection in
# plans/2026-04-27-portscan-afxdp-plan-v1.md §10).
#
# Default 0 — existing AMIs are unchanged. The scripts NEVER
# auto-reboot; the kernel image is staged on disk and the operator
# schedules the reboot. After install the scripts probe
# /sys/module/ena/version + dmesg for `ena_xdp_zc` support and warn
# if absent so the operator knows whether the CURRENTLY-RUNNING
# kernel will deliver zerocopy. Out of scope here: AMI rebuild,
# auto-reboot, the ena driver patches themselves.
#
# Override the package / suite / source list / mirror with the
# matching ANYSCAN_KERNEL_BACKPORT_* variables if you carry a
# different backport channel (e.g. an internal Debian mirror).
# ANYSCAN_INSTALL_KERNEL_BACKPORT=0
# ANYSCAN_KERNEL_BACKPORT_MIN_VERSION=6.16
# ANYSCAN_KERNEL_BACKPORT_PACKAGE=linux-image-cloud-amd64
# ANYSCAN_KERNEL_BACKPORT_SUITE=bookworm-backports
# ANYSCAN_KERNEL_BACKPORT_SOURCES_LIST=/etc/apt/sources.list.d/anyscan-bookworm-backports.list
# ANYSCAN_KERNEL_BACKPORT_MIRROR=http://deb.debian.org/debian
# Installed bundle asset locations
EXTENSION_MANIFEST_PATHS=/opt/agentd/extensions/bootstrap-provisioner.json,/opt/agentd/extensions/portscan-adapter.json
ARTIFACT_DIR=/var/lib/agentd/artifacts
SCANNER_BIN=/opt/agentd/bin/scanner
# Remote self-update
AGENT_REMOTE_UPDATE_ENABLED=true
AGENT_REMOTE_UPDATE_REQUEST_FILE=/var/lib/agentd/remote-update.request
AGENT_REMOTE_UPDATE_STATUS_FILE=/var/lib/agentd/remote-update.status
# AGENT_REMOTE_UPDATE_INSTALLER_URL=optional explicit override; defaults to AGENT_MANAGEMENT_URL/api/agent/install.sh
AGENT_REMOTE_UPDATE_BACKUP_ROOT=/var/lib/agentd/update-backups
AGENT_REMOTE_UPDATE_HEALTHCHECK_TIMEOUT_SECONDS=120
AGENT_REMOTE_UPDATE_HEALTHCHECK_INTERVAL_SECONDS=5
AGENT_REMOTE_UPDATE_ROLLBACK_ON_FAILURE=true
AGENT_REMOTE_DEBUG_ENABLED=true
# Optional worker-local inventory overrides
# ALLOWED_HOST_SUFFIXES=example.com
# ALLOWED_HOSTS=
# ALLOWED_CIDRS=
# ALLOWED_PORTS=