apache · peter-toth · May 6, 2026
diff --git a/docs/configuration.md b/docs/configuration.md
@@ -24,9 +24,9 @@ under the License.
 Spark Operator supports different ways to configure the behavior:
 
 * **spark-operator.properties** provided when deploying the operator. In addition to the
-  [property file](../build-tools/helm/spark-kubernetes-operator/conf/spark-operator.
-  properties), it is also possible to override or append config properties in helm [Values
-  files](../build-tools/helm/spark-kubernetes-operator/values.yaml).
+  [property file](../build-tools/helm/spark-kubernetes-operator/conf/spark-operator.properties),
+  it is also possible to override or append config properties in helm
+  [Values files](../build-tools/helm/spark-kubernetes-operator/values.yaml).
 * **System Properties** : when provided as system properties (e.g. via -D options to the
   operator JVM), it overrides the values provided in property file.
 * **Hot property loading** : when enabled, a

diff --git a/docs/operations.md b/docs/operations.md
@@ -21,9 +21,9 @@ under the License.
 
 ## Compatibility
 
-- Java 21, 25 and 26
+- Java 21 or newer
 - Kubernetes version compatibility:
-  - k8s version >= 1.34 is recommended. Operator attempts to be API compatible as possible, but
+  - k8s version >= 1.34 is recommended. Operator attempts to be as API compatible as possible, but
       patch support will not be performed on k8s versions that reached EOL.
 - Spark versions 3.5 or above.
 
@@ -122,7 +122,7 @@ following table:
 | operatorConfiguration.spark-operator.properties                  | The default operator configuration.                                                                                                                                            |                                                                                                         |
 | operatorConfiguration.metrics.properties                         | The default operator metrics (sink) configuration.                                                                                                                             |                                                                                                         |
 | operatorConfiguration.dynamicConfig.create                       | If set to true, a config map would be created & watched by operator as source of truth for hot properties loading.                                                             | false                                                                                                   |
-| operatorConfiguration.dynamicConfig.enable                       | If set to true, operator would honor the created config mapas source of truth for hot properties loading.                                                                      | false                                                                                                   |
+| operatorConfiguration.dynamicConfig.enable                       | If set to true, operator would honor the created config map as source of truth for hot properties loading.                                                                     | false                                                                                                   |
 | operatorConfiguration.dynamicConfig.annotations                  | Annotations to be applied for the dynamicConfig resources.                                                                                                                     | `"helm.sh/resource-policy": keep`                                                                       |
 | operatorConfiguration.dynamicConfig.data                         | Data field (key-value pairs) that acts as hot properties in the config map.                                                                                                    | `spark.kubernetes.operator.reconciler.intervalSeconds: "60"`                                            |
 
@@ -172,9 +172,9 @@ Check installation.
 
 ```bash
 $ helm list -A
-NAME      NAMESPACE REVISION UPDATED                              STATUS   CHART                           APP VERSION
-us-west-1 us-west-1 1        2025-10-08 22:04:45.530136 -0700 PDT deployed spark-kubernetes-operator-1.3.0 0.5.0
-us-west-2 us-west-2 1        2025-10-08 22:04:48.747434 -0700 PDT deployed spark-kubernetes-operator-1.3.0 0.5.0
+NAME      NAMESPACE REVISION UPDATED                              STATUS   CHART                               APP VERSION
+us-west-1 us-west-1 1        2026-05-06 10:00:00.000000 -0700 PDT deployed spark-kubernetes-operator-1.7.0-dev 0.9.0-SNAPSHOT
+us-west-2 us-west-2 1        2026-05-06 10:00:03.000000 -0700 PDT deployed spark-kubernetes-operator-1.7.0-dev 0.9.0-SNAPSHOT
 ```
 
 Launch `pi.yaml` at `us-west-1` and `us-west-2` namespaces.

diff --git a/docs/spark_custom_resources.md b/docs/spark_custom_resources.md
@@ -40,20 +40,20 @@ kind: SparkApplication
 metadata:
   name: pi
 spec:
-  # Entry point for the app  
+  # Entry point for the app
   mainClass: "org.apache.spark.examples.SparkPi"
   jars: "local:///opt/spark/examples/jars/spark-examples.jar"
   sparkConf:
     spark.dynamicAllocation.enabled: "true"
     spark.dynamicAllocation.shuffleTracking.enabled: "true"
     spark.dynamicAllocation.maxExecutors: "3"
     spark.kubernetes.authenticate.driver.serviceAccountName: "spark"
-    spark.kubernetes.container.image: "apache/spark:4.0.0"
+    spark.kubernetes.container.image: "apache/spark:4.1.1-scala"
   applicationTolerations:
     resourceRetainPolicy: OnFailure
+    ttlAfterStopMillis: 10000
   runtimeVersions:
-    scalaVersion: "2.13"
-    sparkVersion: "4.0.0"
+    sparkVersion: "4.1.1"
 ```
 
 After application is submitted, Operator will add status information to your application based on
@@ -204,13 +204,12 @@ are creating / managing SparkApplications with external microservices or workflo
 Spark Operator recognizes "infrastructure failure" in the best effort way. It is possible to
 configure different restart policy on general failure(s) vs. on potential infrastructure
 failure(s). For example, you may configure the app to restart only upon infrastructure
-failures. If Spark application fails as a result of `DriverStartTimedOut`,
-`ExecutorsStartTimedOut`, `SchedulingFailure`.
-
-It is more likely that the app failed as a result of infrastructure reason(s), including
-scenarios like driver or executors cannot be scheduled or cannot initialize in configured
-time window for scheduler reasons, as a result of insufficient capacity, cannot get IP
-allocated, cannot pull images, or k8s API server issue at scheduling .etc.
+failures. If a Spark application fails with `DriverStartTimedOut`, `ExecutorsStartTimedOut`,
+or `SchedulingFailure`, it is more likely that the app failed as a result of infrastructure
+reason(s), including scenarios like driver or executors cannot be scheduled or cannot
+initialize in configured time window for scheduler reasons, as a result of insufficient
+capacity, cannot get IP allocated, cannot pull images, or k8s API server issues at
+scheduling, etc.
 
 Please be advised that this is a best-effort failure identification. You may still need to
 debug actual failure from the driver pods. Spark Operator would stage the last observed
@@ -250,11 +249,12 @@ The operator maintains multiple counters to track different types of restarts:
 - Consecutive failure tracking: The failure-specific counters track consecutive failures
   of the app, distinguishing between persistent failures (requiring intervention) and
   transient issues (safe for retry).
-  - For Example: With `restartPolicy=Always`, `maxRestartAttempts=5` and `maxRestartOnFailure=2`:
-  - The app would tolerate at maximum of 3 consecutive failures, with maximal of 5 restarts
-  - In other words, sequence F -> F -> F would stop.
-  - sequence F -> S -> F -> S -> F would continue with the 5th restart as the succeeded attempts
-    reset the failure counter
+  - Example: with `restartPolicy=Always`, `maxRestartAttempts=5`, and `maxRestartOnFailure=2`:
+  - The app tolerates at most 2 consecutive failures; the 3rd consecutive failure stops it,
+    within an overall cap of 5 total restarts.
+  - In other words, the sequence F -> F -> F stops on the 3rd F.
+  - The sequence F -> S -> F -> S -> F continues, because each successful attempt
+    resets the consecutive-failure counter.
 - Granular control over `SchedulingFailure`: similarly, it's possible to control the maximal
   restart and backoff interval for consecutive `SchedulingFailure` attempts, as it can be highly
   associated with API server rejections, quota exceeded, resource constraints.
@@ -438,7 +438,7 @@ For example, if an app with below configuration:
 applicationTolerations:
   restartConfig:
     restartPolicy: OnFailure
-  maxRestartAttempts: 1
+    maxRestartAttempts: 1
   resourceRetainPolicy: Always
   resourceRetainDurationMillis: 30000
   ttlAfterStopMillis: 60000