Skip to content

Commit 72aac4d

Browse files
authored
Add/tbon connect timeout test (#162)
* adding tbon connect timeout * re-organize log level and option flags too This will resolve an issue that zeromq by default has a timeout on retry with exponential backoff. Although there is still a remaining issue in Kubernetes that adding a service pod seems to improve network readiness, for now this addition of the zeromq timeout will reduce the startup times from grossly large to acceptably so. Signed-off-by: vsoch <[email protected]>
1 parent ea422ce commit 72aac4d

30 files changed

+1406
-694
lines changed

README.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,13 @@ Read more, including user and developer guides, and project background in our
1313
**Important!** We recently removed a one-off container that ran before the MiniCluster creation to generate a certificate.
1414
We have found [through testing](https://github.com/kubernetes-sigs/jobset/issues/104) that this somehow served as a warmup
1515
for networking, and this means if you use the latest operator here, you may see slow times in creating the initial
16-
broker setup. We have two sets of bugfixes to go in that should resolve this:
16+
broker setup. More details are available in [this post](https://github.com/converged-computing/operator-experiments/tree/main/google/service-timing).
17+
We have fixed the zeromq timeout bug, and will hopefully be able to reproduce the issue outside of the operator
18+
soon to report upstream.
1719

18-
- An update to set the zeromq timeout (TBA soon)
19-
- a TBA update to resolve whatever the noticed bug is above (TBA unknown)
20+
## Presentations
2021

21-
With the bug, you might see creation times between 40-140 seconds for a single MiniCluster, which is abysmal.
22-
With the fix to zeromq, this goes does to 19-20. With the further addition of adding the warmup service, it goes
23-
down to ~16. With the service plus a better networking setup than kube-dns, it returns to the original 11-12 seconds.
24-
Thank you for your patience as we work on this - we will hopefully get everything resolved soon!
22+
- [Kubecon 2023](https://t.co/vjRydPx1rb)
2523

2624
## Organization
2725

api/v1alpha1/minicluster_types.go

Lines changed: 25 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,10 @@ type MiniClusterSpec struct {
4848
// +optional
4949
Interactive bool `json:"interactive"`
5050

51+
// Flux options for the broker, shared across cluster
52+
// +optional
53+
Flux FluxSpec `json:"flux"`
54+
5155
// Volumes accessible to containers from a host
5256
// Not all containers are required to use them
5357
// +optional
@@ -300,6 +304,27 @@ type ContainerVolume struct {
300304
ReadOnly bool `json:"readOnly,omitempty"`
301305
}
302306

307+
type FluxSpec struct {
308+
309+
// Single user executable to provide to flux start
310+
// +kubebuilder:default="5s"
311+
// +default="5s"
312+
ConnectTimeout string `json:"connectTimeout,omitempty"`
313+
314+
// Flux option flags, usually provided with -o
315+
// optional - if needed, default option flags for the server
316+
// These can also be set in the user interface to override here.
317+
// This is only valid for a FluxRunner "runFlux" true
318+
// +optional
319+
OptionFlags string `json:"optionFlags"`
320+
321+
// Log level to use for flux logging (only in non TestMode)
322+
// +kubebuilder:default=6
323+
// +default=6
324+
// +optional
325+
LogLevel int32 `json:"logLevel,omitempty"`
326+
}
327+
303328
type MiniClusterContainer struct {
304329

305330
// Container image must contain flux and flux-sched install
@@ -386,19 +411,6 @@ type MiniClusterContainer struct {
386411
// +optional
387412
ExistingVolumes map[string]MiniClusterExistingVolume `json:"existingVolumes"`
388413

389-
// Flux option flags, usually provided with -o
390-
// optional - if needed, default option flags for the server
391-
// These can also be set in the user interface to override here.
392-
// This is only valid for a FluxRunner "runFlux" true
393-
// +optional
394-
FluxOptionFlags string `json:"fluxOptionFlags"`
395-
396-
// Log level to use for flux logging (only in non TestMode)
397-
// +kubebuilder:default=6
398-
// +default=6
399-
// +optional
400-
FluxLogLevel int32 `json:"fluxLogLevel,omitempty"`
401-
402414
// Special command to run at beginning of script, directly after asFlux
403415
// is defined as sudo -u flux -E (so you can change that if desired.)
404416
// This is only valid if FluxRunner is set (that writes a wait.sh script)

api/v1alpha1/swagger.json

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,27 @@
115115
}
116116
}
117117
},
118+
"FluxSpec": {
119+
"type": "object",
120+
"properties": {
121+
"connectTimeout": {
122+
"description": "Single user executable to provide to flux start",
123+
"type": "string",
124+
"default": "5s"
125+
},
126+
"logLevel": {
127+
"description": "Log level to use for flux logging (only in non TestMode)",
128+
"type": "integer",
129+
"format": "int32",
130+
"default": 6
131+
},
132+
"optionFlags": {
133+
"description": "Flux option flags, usually provided with -o optional - if needed, default option flags for the server These can also be set in the user interface to override here. This is only valid for a FluxRunner \"runFlux\" true",
134+
"type": "string",
135+
"default": ""
136+
}
137+
}
138+
},
118139
"FluxUser": {
119140
"type": "object",
120141
"properties": {
@@ -258,17 +279,6 @@
258279
"$ref": "#/definitions/MiniClusterExistingVolume"
259280
}
260281
},
261-
"fluxLogLevel": {
262-
"description": "Log level to use for flux logging (only in non TestMode)",
263-
"type": "integer",
264-
"format": "int32",
265-
"default": 6
266-
},
267-
"fluxOptionFlags": {
268-
"description": "Flux option flags, usually provided with -o optional - if needed, default option flags for the server These can also be set in the user interface to override here. This is only valid for a FluxRunner \"runFlux\" true",
269-
"type": "string",
270-
"default": ""
271-
},
272282
"fluxUser": {
273283
"description": "Flux User, if created in the container",
274284
"default": {},
@@ -436,6 +446,11 @@
436446
"format": "int64",
437447
"default": 31500000
438448
},
449+
"flux": {
450+
"description": "Flux options for the broker, shared across cluster",
451+
"default": {},
452+
"$ref": "#/definitions/FluxSpec"
453+
},
439454
"fluxRestful": {
440455
"description": "Customization to Flux Restful API There should only be one container to run flux with runFlux",
441456
"default": {},

api/v1alpha1/zz_generated.deepcopy.go

Lines changed: 16 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

api/v1alpha1/zz_generated.openapi.go

Lines changed: 45 additions & 17 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

chart/templates/minicluster-crd.yaml

Lines changed: 19 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -122,17 +122,6 @@ spec:
122122
type: object
123123
description: Existing Volumes to add to the containers
124124
type: object
125-
fluxLogLevel:
126-
default: 6
127-
description: Log level to use for flux logging (only in non TestMode)
128-
format: int32
129-
type: integer
130-
fluxOptionFlags:
131-
description: Flux option flags, usually provided with -o optional
132-
- if needed, default option flags for the server These can also
133-
be set in the user interface to override here. This is only
134-
valid for a FluxRunner "runFlux" true
135-
type: string
136125
fluxUser:
137126
description: Flux User, if created in the container
138127
properties:
@@ -249,6 +238,25 @@ spec:
249238
Approximately one year. This cannot be zero or job won't start
250239
format: int64
251240
type: integer
241+
flux:
242+
description: Flux options for the broker, shared across cluster
243+
properties:
244+
connectTimeout:
245+
default: 5s
246+
description: Single user executable to provide to flux start
247+
type: string
248+
logLevel:
249+
default: 6
250+
description: Log level to use for flux logging (only in non TestMode)
251+
format: int32
252+
type: integer
253+
optionFlags:
254+
description: Flux option flags, usually provided with -o optional
255+
- if needed, default option flags for the server These can also
256+
be set in the user interface to override here. This is only valid
257+
for a FluxRunner "runFlux" true
258+
type: string
259+
type: object
252260
fluxRestful:
253261
description: Customization to Flux Restful API There should only be
254262
one container to run flux with runFlux
@@ -420,17 +428,6 @@ spec:
420428
type: object
421429
description: Existing Volumes to add to the containers
422430
type: object
423-
fluxLogLevel:
424-
default: 6
425-
description: Log level to use for flux logging (only in non TestMode)
426-
format: int32
427-
type: integer
428-
fluxOptionFlags:
429-
description: Flux option flags, usually provided with -o optional
430-
- if needed, default option flags for the server These can also
431-
be set in the user interface to override here. This is only
432-
valid for a FluxRunner "runFlux" true
433-
type: string
434431
fluxUser:
435432
description: Flux User, if created in the container
436433
properties:

config/crd/bases/flux-framework.org_miniclusters.yaml

Lines changed: 19 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -122,18 +122,6 @@ spec:
122122
type: object
123123
description: Existing Volumes to add to the containers
124124
type: object
125-
fluxLogLevel:
126-
default: 6
127-
description: Log level to use for flux logging (only in non
128-
TestMode)
129-
format: int32
130-
type: integer
131-
fluxOptionFlags:
132-
description: Flux option flags, usually provided with -o optional
133-
- if needed, default option flags for the server These can
134-
also be set in the user interface to override here. This is
135-
only valid for a FluxRunner "runFlux" true
136-
type: string
137125
fluxUser:
138126
description: Flux User, if created in the container
139127
properties:
@@ -252,6 +240,25 @@ spec:
252240
Approximately one year. This cannot be zero or job won't start
253241
format: int64
254242
type: integer
243+
flux:
244+
description: Flux options for the broker, shared across cluster
245+
properties:
246+
connectTimeout:
247+
default: 5s
248+
description: Single user executable to provide to flux start
249+
type: string
250+
logLevel:
251+
default: 6
252+
description: Log level to use for flux logging (only in non TestMode)
253+
format: int32
254+
type: integer
255+
optionFlags:
256+
description: Flux option flags, usually provided with -o optional
257+
- if needed, default option flags for the server These can also
258+
be set in the user interface to override here. This is only
259+
valid for a FluxRunner "runFlux" true
260+
type: string
261+
type: object
255262
fluxRestful:
256263
description: Customization to Flux Restful API There should only be
257264
one container to run flux with runFlux
@@ -424,18 +431,6 @@ spec:
424431
type: object
425432
description: Existing Volumes to add to the containers
426433
type: object
427-
fluxLogLevel:
428-
default: 6
429-
description: Log level to use for flux logging (only in non
430-
TestMode)
431-
format: int32
432-
type: integer
433-
fluxOptionFlags:
434-
description: Flux option flags, usually provided with -o optional
435-
- if needed, default option flags for the server These can
436-
also be set in the user interface to override here. This is
437-
only valid for a FluxRunner "runFlux" true
438-
type: string
439434
fluxUser:
440435
description: Flux User, if created in the container
441436
properties:

0 commit comments

Comments
 (0)