Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions modules/nw-openstack-hw-offload-testpmd-pod.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="nw-openstack-hw-offload-testpmd-pod_{context}"]
= A test pod template for clusters that use OVS hardware offloading on OpenStack

[role="_abstract"]
The following `testpmd` pod demonstrates Open vSwitch (OVS) hardware offloading on {rh-openstack-first}.

.An example `testpmd` pod
Expand All @@ -19,7 +20,7 @@ metadata:
annotations:
k8s.v1.cni.cncf.io/networks: hwoffload1
spec:
runtimeClassName: performance-cnf-performanceprofile <1>
runtimeClassName: performance-cnf-performanceprofile
containers:
- name: testpmd
command: ["sleep", "99999"]
Expand Down Expand Up @@ -47,4 +48,7 @@ spec:
emptyDir:
medium: HugePages
----
<1> If your performance profile is not named `cnf-performance profile`, replace that string with the correct performance profile name.

--
* If your performance profile is not named `cnf-performance profile`, replace that string with the correct performance profile name.
--
8 changes: 6 additions & 2 deletions modules/nw-openstack-sr-iov-testpmd-pod.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="nw-openstack-ovs-sr-iov-testpmd-pod_{context}"]
= A test pod template for clusters that use SR-IOV on OpenStack

[role="_abstract"]
The following `testpmd` pod demonstrates container creation with huge pages, reserved CPUs, and the SR-IOV port.

.An example `testpmd` pod
Expand Down Expand Up @@ -45,10 +46,13 @@ spec:
- mountPath: /dev/hugepages
name: hugepage
readOnly: False
runtimeClassName: performance-cnf-performanceprofile <1>
runtimeClassName: performance-cnf-performanceprofile
volumes:
- name: hugepage
emptyDir:
medium: HugePages
----
<1> This example assumes that the name of the performance profile is `cnf-performance profile`.

--
* This example assumes that the name of the performance profile is `cnf-performance profile`.
--
12 changes: 8 additions & 4 deletions modules/nw-sr-iov-network-node-configuration-examples.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="nw-sr-iov-network-node-configuration-examples_{context}"]
= SR-IOV network node configuration examples

[role="_abstract"]
The following example describes the configuration for an InfiniBand device:

.Example configuration for an InfiniBand device
Expand Down Expand Up @@ -45,13 +46,16 @@ spec:
resourceName: <sriov_resource_name>
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
numVfs: 1 <1>
numVfs: 1
nicSelector:
vendor: "<vendor_code>"
deviceID: "<device_id>"
netFilter: "openstack/NetworkID:ea24bd04-8674-4f69-b0ee-fa0b3bd20509" <2>
netFilter: "openstack/NetworkID:ea24bd04-8674-4f69-b0ee-fa0b3bd20509"
# ...
----
<1> When configuring the node network policy for a virtual machine, the `numVfs` parameter is always set to `1`.
<2> When the virtual machine is deployed on {rh-openstack}, the `netFilter` parameter must refer to a network ID. Valid values for `netFilter` are available from an `SriovNetworkNodeState` object.

--
* When configuring the node network policy for a virtual machine, the `numVfs` parameter is always set to `1`.
* When the virtual machine is deployed on {rh-openstack}, the `netFilter` parameter must refer to a network ID. Valid values for `netFilter` are available from an `SriovNetworkNodeState` object.
--

8 changes: 6 additions & 2 deletions modules/nw-sriov-device-discovery.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
[id="discover-sr-iov-devices_{context}"]
= Automated discovery of SR-IOV network devices

[role="_abstract"]
The SR-IOV Network Operator searches your cluster for SR-IOV capable network devices on worker nodes.
The Operator creates and updates a SriovNetworkNodeState custom resource (CR) for each worker node that provides a compatible SR-IOV network device.

Expand Down Expand Up @@ -79,5 +80,8 @@ status:
vendor: "8086"
syncStatus: Succeeded
----
<1> The value of the `name` field is the same as the name of the worker node.
<2> The `interfaces` stanza includes a list of all of the SR-IOV devices discovered by the Operator on the worker node.

--
* The value of the `name` field is the same as the name of the worker node.
* The `interfaces` stanza includes a list of all of the SR-IOV devices discovered by the Operator on the worker node.
--
1 change: 1 addition & 0 deletions modules/nw-sriov-exclude-topology-manager.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="nw-sriov-exclude-topology-manager_{context}"]
= Exclude the SR-IOV network topology for NUMA-aware scheduling

[role="_abstract"]
You can exclude advertising the Non-Uniform Memory Access (NUMA) node for the SR-IOV network to the Topology Manager for more flexible SR-IOV network deployments during NUMA-aware pod scheduling.

In some scenarios, it is a priority to maximize CPU and memory resources for a pod on a single NUMA node. By not providing a hint to the Topology Manager about the NUMA node for the pod's SR-IOV network resource, the Topology Manager can deploy the SR-IOV network resource and the pod CPU and memory resources to different NUMA nodes. This can add to network latency because of the data transfer between NUMA nodes. However, it is acceptable in scenarios when workloads require optimal CPU and memory performance.
Expand Down
1 change: 1 addition & 0 deletions modules/nw-sriov-huge-pages.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="nw-sriov-hugepages_{context}"]
= Huge pages resource injection for Downward API

[role="_abstract"]
When a pod specification includes a resource request or limit for huge pages, the Network Resources Injector automatically adds Downward API fields to the pod specification to provide the huge pages information to the container.

The Network Resources Injector adds a volume that is named `podnetinfo` and is mounted at `/etc/podnetinfo` for each container in the pod. The volume uses the Downward API and includes a file for huge pages requests and limits. The file naming convention is as follows:
Expand Down
113 changes: 49 additions & 64 deletions modules/nw-sriov-networknodepolicy-object.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="nw-sriov-networknodepolicy-object_{context}"]
= SR-IOV network node configuration object

[role="_abstract"]
You specify the SR-IOV network device configuration for a node by creating an SR-IOV network node policy. The API object for the policy is part of the `sriovnetwork.openshift.io` API group.

The following YAML describes an SR-IOV network node policy:
Expand All @@ -15,49 +16,46 @@ The following YAML describes an SR-IOV network node policy:
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: <name> <1>
namespace: openshift-sriov-network-operator <2>
name: <name>
namespace: openshift-sriov-network-operator
spec:
resourceName: <sriov_resource_name> <3>
resourceName: <sriov_resource_name>
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true" <4>
priority: <priority> <5>
mtu: <mtu> <6>
needVhostNet: false <7>
numVfs: <num> <8>
externallyManaged: false <9>
nicSelector: <10>
vendor: "<vendor_code>" <11>
deviceID: "<device_id>" <12>
pfNames: ["<pf_name>", ...] <13>
rootDevices: ["<pci_bus_id>", ...] <14>
netFilter: "<filter_string>" <15>
deviceType: <device_type> <16>
isRdma: false <17>
linkType: <link_type> <18>
eSwitchMode: "switchdev" <19>
excludeTopology: false <20>
feature.node.kubernetes.io/network-sriov.capable: "true"
priority: <priority>
mtu: <mtu>
needVhostNet: false
numVfs: <num>
externallyManaged: false
nicSelector:
vendor: "<vendor_code>"
deviceID: "<device_id>"
pfNames: ["<pf_name>", ...]
rootDevices: ["<pci_bus_id>", ...]
netFilter: "<filter_string>"
deviceType: <device_type>
isRdma: false
linkType: <link_type>
eSwitchMode: "switchdev"
excludeTopology: false
----
<1> The name for the custom resource object.

<2> The namespace where the SR-IOV Network Operator is installed.
where:

<3> The resource name of the SR-IOV network device plugin. You can create multiple SR-IOV network node policies for a resource name.
+
When specifying a name, be sure to use the accepted syntax expression `^[a-zA-Z0-9_]+$` in the `resourceName`.

<4> The node selector specifies the nodes to configure. Only SR-IOV network devices on the selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed on selected nodes only.
--
`metadata.name`:: Specifies the name for the custom resource object.
`metadata.namespace`:: Specifies the namespace where the SR-IOV Network Operator is installed.
`spec.resourceName`:: Specifies the resource name of the SR-IOV network device plugin. You can create multiple SR-IOV network node policies for a resource name. When specifying a name, be sure to use the accepted syntax expression `^[a-zA-Z0-9_]+$`.
`spec.nodeSelector`:: Specifies the nodes to configure. Only SR-IOV network devices on the selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed on selected nodes only.
+
[IMPORTANT]
====
The SR-IOV Network Operator applies node network configuration policies to nodes in sequence. Before applying node network configuration policies, the SR-IOV Network Operator checks if the machine config pool (MCP) for a node is in an unhealthy state such as `Degraded` or `Updating`. If a node is in an unhealthy MCP, the process of applying node network configuration policies to all targeted nodes in the cluster pauses until the MCP returns to a healthy state.

To avoid a node in an unhealthy MCP from blocking the application of node network configuration policies to other nodes, including nodes in other MCPs, you must create a separate node network configuration policy for each MCP.
====

<5> Optional: The priority is an integer value between `0` and `99`. A smaller value receives higher priority. For example, a priority of `10` is a higher priority than `99`. The default value is `99`.

<6> Optional: The maximum transmission unit (MTU) of the physical function and all its virtual functions. The maximum MTU value can vary for different network interface controller (NIC) models.
`spec.priority`:: Optional: Specifies the priority as an integer value between `0` and `99`. A smaller value receives higher priority. For example, a priority of `10` is a higher priority than `99`. The default value is `99`.
`spec.mtu`:: Optional: Specifies the maximum transmission unit (MTU) of the physical function and all its virtual functions. The maximum MTU value can vary for different network interface controller (NIC) models.
+
[IMPORTANT]
====
Expand All @@ -66,57 +64,44 @@ If you want to create virtual function on the default network interface, ensure
If you want to modify the MTU of a single virtual function while the function is assigned to a pod, leave the MTU value blank in the SR-IOV network node policy.
Otherwise, the SR-IOV Network Operator reverts the MTU of the virtual function to the MTU value defined in the SR-IOV network node policy, which might trigger a node drain.
====

<7> Optional: Set `needVhostNet` to `true` to mount the `/dev/vhost-net` device in the pod. Use the mounted `/dev/vhost-net` device with Data Plane Development Kit (DPDK) to forward traffic to the kernel network stack.

<8> The number of the virtual functions (VF) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than `127`.

<9> The `externallyManaged` field indicates whether the SR-IOV Network Operator manages all, or only a subset of virtual functions (VFs). With the value set to `false` the SR-IOV Network Operator manages and configures all VFs on the PF.
`spec.needVhostNet`:: Optional: Set to `true` to mount the `/dev/vhost-net` device in the pod. Use the mounted `/dev/vhost-net` device with Data Plane Development Kit (DPDK) to forward traffic to the kernel network stack.
`spec.numVfs`:: Specifies the number of the virtual functions (VF) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than `127`.
`spec.externallyManaged`:: Indicates whether the SR-IOV Network Operator manages all, or only a subset of virtual functions (VFs). With the value set to `false` the SR-IOV Network Operator manages and configures all VFs on the PF.
+
[NOTE]
====
When `externallyManaged` is set to `true`, you must manually create the Virtual Functions (VFs) on the physical function (PF) before applying the `SriovNetworkNodePolicy` resource. If the VFs are not pre-created, the SR-IOV Network Operator's webhook will block the policy request.

When `externallyManaged` is set to `false`, the SR-IOV Network Operator automatically creates and manages the VFs, including resetting them if necessary.

To use VFs on the host system, you must create them through NMState, and set `externallyManaged` to `true`. In this mode, the SR-IOV Network Operator does not modify the PF or the manually managed VFs, except for those explicitly defined in the `nicSelector` field of your policy. However, the SR-IOV Network Operator continues to manage VFs that are used as pod secondary interfaces.
To use VFs on the host system, you must create them through NMState, and set `externallyManaged` to `true`. In this mode, the SR-IOV Network Operator does not modify the PF or the manually managed VFs, except for those explicitly defined in the `nicSelector` field of your policy. However, the SR-IOV Network Operator continues to manage VFs that are used as pod secondary interfaces.
====

<10> The NIC selector identifies the device to which this resource applies. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally.
`spec.nicSelector`:: Identifies the device to which this resource applies. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally.
+
If you specify `rootDevices`, you must also specify a value for `vendor`, `deviceID`, or `pfNames`. If you specify both `pfNames` and `rootDevices` at the same time, ensure that they refer to the same device. If you specify a value for `netFilter`, then you do not need to specify any other parameter because a network ID is unique.

<11> Optional: The vendor hexadecimal vendor identifier of the SR-IOV network device. The only allowed values are `8086` (Intel) and `15b3` (Mellanox).

<12> Optional: The device hexadecimal device identifier of the SR-IOV network device. For example, `101b` is the device ID for a Mellanox ConnectX-6 device.

<13> Optional: An array of one or more physical function (PF) names the resource must apply to.

<14> Optional: An array of one or more PCI bus addresses the resource must apply to. For example `0000:02:00.1`.

<15> Optional: The platform-specific network filter. The only supported platform is {rh-openstack-first}. Acceptable values use the following format: `openstack/NetworkID:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`. Replace `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx` with the value from the `/var/config/openstack/latest/network_data.json` metadata file. This filter ensures that VFs are associated with a specific OpenStack network. The operator uses this filter to map the VFs to the appropriate network based on metadata provided by the OpenStack platform.

<16> Optional: The driver to configure for the VFs created from this resource. The only allowed values are `netdevice` and `vfio-pci`. The default value is `netdevice`.
`spec.nicSelector.vendor`:: Optional: Specifies the vendor hexadecimal identifier of the SR-IOV network device. The only allowed values are `8086` (Intel) and `15b3` (Mellanox).
`spec.nicSelector.deviceID`:: Optional: Specifies the device hexadecimal identifier of the SR-IOV network device. For example, `101b` is the device ID for a Mellanox ConnectX-6 device.
`spec.nicSelector.pfNames`:: Optional: Specifies an array of one or more physical function (PF) names the resource must apply to.
`spec.nicSelector.rootDevices`:: Optional: Specifies an array of one or more PCI bus addresses the resource must apply to. For example `0000:02:00.1`.
`spec.nicSelector.netFilter`:: Optional: Specifies the platform-specific network filter. The only supported platform is {rh-openstack-first}. Acceptable values use the following format: `openstack/NetworkID:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`. Replace `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx` with the value from the `/var/config/openstack/latest/network_data.json` metadata file. This filter ensures that VFs are associated with a specific OpenStack network. The operator uses this filter to map the VFs to the appropriate network based on metadata provided by the OpenStack platform.
Comment thread
kquinn1204 marked this conversation as resolved.
`spec.deviceType`:: Optional: Specifies the driver to configure for the VFs created from this resource. The only allowed values are `netdevice` and `vfio-pci`. The default value is `netdevice`.
+
For a Mellanox NIC to work in DPDK mode on bare metal nodes, use the `netdevice` driver type and set `isRdma` to `true`.

<17> Optional: Configures whether to enable remote direct memory access (RDMA) mode. The default value is `false`.
For a Mellanox NIC to work in DPDK mode on bare-metal nodes, use the `netdevice` driver type and set `isRdma` to `true`.
`spec.isRdma`:: Optional: Configures whether to enable remote direct memory access (RDMA) mode. The default value is `false`.
+
If the `isRdma` parameter is set to `true`, you can continue to use the RDMA-enabled VF as a normal network device. A device can be used in either mode.
+
Set `isRdma` to `true` and additionally set `needVhostNet` to `true` to configure a Mellanox NIC for use with Fast Datapath DPDK applications.
Set `isRdma` to `true` and additionally set `needVhostNet` to `true` to configure a Mellanox NIC for use with Fast data path DPDK applications.
+
[NOTE]
====
You cannot set the `isRdma` parameter to `true` for intel NICs.
You cannot set the `isRdma` parameter to `true` for Intel NICs.
====

<18> Optional: The link type for the VFs. The default value is `eth` for Ethernet. Change this value to 'ib' for InfiniBand.
`spec.linkType`:: Optional: Specifies the link type for the VFs. The default value is `eth` for Ethernet. Change this value to 'ib' for InfiniBand.
+
When `linkType` is set to `ib`, `isRdma` is automatically set to `true` by the SR-IOV Network Operator webhook. When `linkType` is set to `ib`, `deviceType` should not be set to `vfio-pci`.
+
Do not set linkType to `eth` for SriovNetworkNodePolicy, because this can lead to an incorrect number of available devices reported by the device plugin.

<19> Optional: To enable hardware offloading, you must set the `eSwitchMode` field to `"switchdev"`. For more information about hardware offloading, see "Configuring hardware offloading".

<20> Optional: To exclude advertising an SR-IOV network resource's NUMA node to the Topology Manager, set the value to `true`. The default value is `false`.
Do not set `linkType` to `eth` for `SriovNetworkNodePolicy`, because this can lead to an incorrect number of available devices reported by the device plugin.
`spec.eSwitchMode`:: Optional: Set to `"switchdev"` to enable hardware offloading. For more information about hardware offloading, see "Configuring hardware offloading".
`spec.excludeTopology`:: Optional: Set to `true` to exclude advertising an SR-IOV network resource's NUMA node to the Topology Manager. The default value is `false`.
--
Loading