Linux* Base Drivers for Intel® Ethernet Network Connection

NOTE: This release includes the following Linux* Base Drivers for Intel® Ethernet Network Connection: igb, e1000e and igbvf.
  • igb driver supports all 82575-, 82576-, 82580-, I210-, and I211-, and I350-based gigabit network connections.
  • The igbvf driver supports 82576-based virtual function devices that can only be activated on kernels that support SR-IOV. SR-IOV requires the correct platform and OS support.
  • e1000e driver supports all PCI Express gigabit network connections, except those that are 82575-, 82576-, 82580-, and I350-based.

NOTE: As of July 2012, the e1000 legacy driver is no longer supported and is only maintained through the upstream kernel.

NOTE: The Intel(R) PRO/1000 P Dual Port Server Adapter is supported by the e1000 driver, not the e1000e due to the 82546 part being used behind a PCI Express bridge.

First identify your adapter.  Then follow the appropriate steps for building, installing, and configuring the driver.

UPGRADING: If you currently have the e1000 driver installed and need to install e1000e, perform the following:

  • If your version of e1000 is 7.6.15.5 or less, upgrade to e1000 version 8.x.
  • Install the e1000e driver using the instructions in the e1000e section.
  • Modify /etc/modprobe.conf to point your PCIe devices to use the new e1000e driver using alias ethX e1000e, or use your distribution's specific method for configuring network adapters like RedHat's setup/system-config-network or SuSE's yast2.

Identifying Your Adapter

First identify your adapter.  Then select the name of the specified base driver: igb or e1000e.

For more information on how to identify your adapter, go to the Adapter & Driver ID Guide at:

http://support.intel.com/support/go/network/adapter/idguide.htm

For the latest Intel network drivers for Linux, refer to the following website. Select the link for your adapter.  

http://support.intel.com/support/go/network/adapter/home.htm


Using the igb Base Driver

Overview

Building and Installation

Command Line Parameters

Additional Configurations

Known Issues

Overview

The Linux base drivers support the 2.4.x, 2.6.x, and 3.x kernels. These drivers includes support for Itanium® 2-based systems.

These drivers are only supported as a loadable module. Intel is not supplying patches against the kernel source to allow for static linking of the drivers. For questions related to hardware requirements, refer to the documentation supplied with your Intel Gigabit adapter. All hardware requirements listed apply to use with Linux.

The following features are now available in supported kernels:

Channel Bonding documentation can be found in the Linux kernel source: /documentation/networking/bonding.txt

The igb driver supports IEEE 1588 time stamping for kernels 2.6.30 and above.

The driver information previously displayed in the /proc file system is not supported in this release. Alternatively, you can use ethtool (version 1.6 or later), lspci, and ifconfig to obtain the same information. Instructions on updating ethtool can be found in the section Additional Configurations later in this document.


Building and Installation

To build a binary RPM* package of this driver, run 'rpmbuild -tb igb.tar.gz'.

NOTES:
  • For the build to work properly, the currently running kernel MUST match the version and configuration of the installed kernel sources. If you have just recompiled the kernel reboot the system now.

  • RPM functionality has only been tested in Red Hat distributions.

  1. Move the base driver tar file to the directory of your choice. For example, use '/home/username/igb' or '/usr/local/src/igb'.

  2. Untar/unzip the archive, where <x.x.x> is the version number for the driver tar file:

    tar zxf igb-<x.x.x>.tar.gz

  3. Change to the driver src directory, where <x.x.x> is the version number for the driver tar:

    cd igb-<x.x.x>/src/

  4. Compile the driver module:

    # make install

    The binary will be installed as:

    /lib/modules/<KERNEL VERSION>/kernel/drivers/net/igb/igb.[k]o

    The install location listed above is the default location. This may differ for various Linux distributions.

  5. Load the module using the modprobe command:

    modprobe igb

    With 2.6 based kernels also make sure that older igb drivers are removed from the kernel, before loading the new module:

    rmmod igb; modprobe igb

  6. Assign an IP address to the interface by entering the following, where <x> is the interface number:

    ifconfig eth<x> <IP_address>

  7. Verify that the interface works. Enter the following, where <IP_address> is the IP address for another machine on the same subnet as the interface that is being tested:

    ping <IP_address>

TROUBLESHOOTING: Some systems have trouble supporting MSI and/or MSI-X interrupts. If you believe your system needs to disable this style of interrupt, the driver can be built and installed with the command:

# make CFLAGS_EXTRA=-DDISABLE_PCI_MSI install

Normally the driver will generate an interrupt every two seconds, so if you can see that you're no longer getting interrupts in cat /proc/interrupts for the ethX igb device, then this workaround may be necessary.

To build igb driver with DCA:

If your kernel supports DCA, the driver will build by default with DCA enabled.


Command Line Parameters

If the driver is built as a module, the following optional parameters are used by entering them on the command line with the modprobe command using this syntax:

modprobe igb [<option>=<VAL1>,<VAL2>,...]

There needs to be a <VAL#> for each network port in the system supported by this driver. The values will be applied to each instance, in function order. For example:

modprobe igb InterruptThrottleRate=16000,16000

In this case, there are two network ports supported by igb in the system. The default value for each parameter is generally the recommended setting, unless otherwise noted.

NOTES:
  • For more information about the AutoNeg, Duplex, and Speed parameters, see the Speed and Duplex Configuration section in this document.

  • For more information about the InterruptThrottleRate, RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay parameters, see the application note at: http://www.intel.com/design/network/applnots/ap450.htm

  • A descriptor describes a data buffer and attributes related to the data buffer. This information is accessed by the hardware.

Parameter Name Valid Range/Settings Default Description
InterruptThrottleRate
Valid Range: 0,1,3,100-100000 (0=off, 1=dynamic, 3=dynamic conservative)
 
3 The driver can limit the amount of interrupts per second that the adapter will generate for incoming packets. It does this by writing a value to the adapter that is based on the maximum amount of interrupts that the adapter will generate per second.

Setting InterruptThrottleRate to a value greater or equal to 100 will program the adapter to send out a maximum of that many interrupts per second, even if more packets have come in. This reduces interrupt load on the system and can lower CPU utilization under heavy load, but will increase latency as packets are not processed as quickly.

The default behaviour of the driver previously assumed a static InterruptThrottleRate value of 8000, providing a good fallback value for all traffic types, but lacking in small packet performance and latency. The hardware can handle many more small packets per second however, and for this reason an adaptive interrupt moderation algorithm was implemented.

The driver has two adaptive modes (setting 1 or 3) in which it dynamically adjusts the InterruptThrottleRate value based on the traffic that it receives. After determining the type of incoming traffic in the last timeframe, it will adjust the InterruptThrottleRate to an appropriate value for that traffic.

The algorithm classifies the incoming traffic every interval into classes. Once the class is determined, the InterruptThrottleRate value is adjusted to suit that traffic type the best. There are three classes defined: "Bulk traffic", for large amounts of packets of normal size; "Low latency", for small amounts of traffic and/or a significant percentage of small packets; and "Lowest latency", for almost completely small packets or minimal traffic.

In dynamic conservative mode, the InterruptThrottleRate value is set to 4000 for traffic that falls in class "Bulk traffic". If traffic falls in the "Low latency" or "Lowest latency" class, the InterruptThrottleRate is increased stepwise to 20000. This default mode is suitable for most applications.

For situations where low latency is vital such as cluster or grid computing, the algorithm can reduce latency even more when InterruptThrottleRate is set to mode 1. In this mode, which operates the same as mode 3, the InterruptThrottleRate will be increased stepwise to 70000 for traffic in class "Lowest latency".

Setting InterruptThrottleRate to 0 turns off any interrupt moderation and may improve small packet latency, but is generally not suitable for bulk throughput traffic

NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and RxAbsIntDelay parameters. In other words, minimizing the receive and/or transmit absolute delays does not force the controller to generate more interrupts than what the Interrupt Throttle Rate allows.

LLIPort 0-65535
 
0 (disabled) LLI (Low Latency Interrupts): LLI allows for immediate generation of an interrupt upon processing receive packets that match certain criteria as set by the parameters described below. LLI parameters are not enabled when Legacy interrupts are used. You must be using MSI or MSI-X (see cat /proc/interrupts) to successfully use LLI.

LLI is configured with the LLIPort command-line parameter, which specifies which TCP port should generate Low Latency Interrupts.

For example, using LLIPort=80 would cause the board to generate an immediate interrupt upon receipt of any packet sent to TCP port 80 on the local machine.

CAUTION: Enabling LLI can result in an excessive number of interrupts/second that may cause problems with the system and in some cases may cause a kernel panic.

LLIPush 0-1
 
0 (disabled)
 
LLIPush can be set to be enabled or disabled (default). It is most effective in an environment with many small transactions.
NOTE: Enabling LLIPush may allow a denial of service attack.
LLISize 0-1500 0 (disabled) LLISize causes an immediate interrupt if the board receives a packet smaller than the specified size.
 
IntMode
 
0-2 (0 = Legacy Int, 1 = MSI and 2 = MSI-X) 2 IntMode controls allow load time control over the type of interrupt registered for by the driver. MSI-X is required for multiple queue support, and some kernels and combinations of kernel .config options will force a lower level of interrupt support. 'cat /proc/interrupts' will show different values for each type of interrupt.
RSS 0-8 1 0 - Assign up to whichever is less, number of CPUS or number of queues
X - Assign X queues where X is less than or equal to maximum number of queues. The driver allows maximum supported queue value. For example, I350-based adapters allow RSS=8 (where 8-queues is the maximum allowable queues).

NOTE: For 82575-based adapters the maximum number of queues is 4; for 82576-based and newer adapters it is 8.

This parameter is also affected by the VMDq parameter in that it will limit the queues more.

VMDQ
Model 0 1 2 3+
82575 4 4 3 1
82576 8 2 2 2
82580 8 1 1 1
VMDQ 0 - 4 on 82575-based adapters; and 0 - 8 for 82576/82580-based adapters.

0 = disabled
1 = sets the netdev as pool 0
2+ = add additional queues but they currently are not used.
0 Supports enabling VMDq pools as this is needed to support SR-IOV.

This parameter is forced to 1 or more if the max_vfs module parameter is used.  In addition the number of queues available for RSS is limited if this is set to 1 or greater.

note NOTE: When either SR-IOV mode or VMDq mode is enabled, hardware VLAN filtering and VLAN tag stripping/insertion will remain enabled.
max_vfs 0-7

If the value is greater than 0 it will also force the VMDq parameter to be 1 or more.
0

This parameter adds support for SR-IOV. It causes the driver to spawn up to max_vfs worth of virtual function.
QueuePairs 0-1 1 (TX and RX will be paired onto one interrupt vector) If set to 0, when MSI-X is enabled, the TX and RX will attempt to occupy separate vectors.

This option can be overridden to 1 if there are not sufficient interrupts available. This can occur if any combination of RSS, VMDQ, and max_vfs  results in more than 4 queues being used.
Node 0-n

0 - n: where n is the number of the NUMA node that should be used to allocate memory for this adapter port.

-1: uses the driver default of allocating memory on whichever processor is running modprobe.

-1 (off) The Node parameter will allow you to pick which NUMA node you want to have   the adapter allocate memory from.  All driver structures, in-memory queues, and receive buffers will be allocated on the node specified.  This parameter is only useful when interrupt affinity is specified, otherwise some portion of the time the interrupt could run on a different core than the memory is allocated on, causing slower memory access and impacting throughput, CPU, or both. 
EEE 0-1 1 (enabled)

A link between two EEE-compliant devices will result in periodic bursts of data followed by periods where the link is in an idle state. This Low Power Idle (LPI) state is supported in both 1Gbps and 100Mbps link speeds.

NOTE: EEE support requires autonegotiation.

DMAC 0, 250, 500, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000. 0 (disabled)

Enables or disables DMA Coalescing feature. Values are in usec�s and increase the internal DMA Coalescing feature�s internal timer. DMA (Direct Memory Access) allows the network device to move packet data directly to the system's memory, reducing CPU utilization. However, the frequency and random intervals at which packets arrive do not allow the system to enter a lower power state. DMA Coalescing allows the adapter to collect packets before it initiates a DMA event. This may increase network latency but also increases the chances that the system will enter a lower power state.

Turning on DMA Coalescing may save energy with kernel 2.6.32 and later. This will impart the greatest chance for your system to consume less power. DMA Coalescing is effective in helping potentially saving the platform power only when it is enabled across all active ports.

InterruptThrottleRate (ITR) should be set to dynamic. When ITR=0, DMA Coalescing is automatically disabled.

A whitepaper containing information on how to best configure your platform is available on the Intel website.

MDD (Malicious Driver Detection) Valid Range: 0, 1; 0 = Disable, 1 = Enable Default Value: 1 This parameter is only relevant for I350 devices operating in SR-IOV mode. When this parameter is set, the driver detects malicious VF driver and disables its TX/RX queues until a VF driver reset occurs.


Additional Configurations

Configuring the Driver on Different Distributions

Configuring a network driver to load properly when the system is started is distribution dependent. Typically, the configuration process involves adding an alias line to /etc/modules.conf or /etc/modprobe.conf as well as editing other system startup scripts and/or configuration files. Many popular Linux distributions ship with tools to make these changes for you. To learn the proper way to configure a network device for your system, refer to your distribution documentation. If during this process you are asked for the driver or module name, the name for the Linux Base Driver for the Gigabit family of adapters is e1000.

As an example, if you install the igb driver for two Gigabit adapters (eth0 and eth1) and want to set the interrupt mode to MSI-X and MSI respectively, add the following to modules.conf or /etc/modprobe.conf:

alias eth0 igb
alias eth1 igb
options igb IntMode=2,1

Viewing Link Messages

Link messages will not be displayed to the console if the distribution is restricting system messages. In order to see network driver link messages on your console, set dmesg to eight by entering the following:

dmesg -n 8
NOTE: This setting is not saved across reboots.

Jumbo Frames

Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) to a value larger than the default value of 1500. Use the ifconfig command to increase the MTU size. For example:

ifconfig eth<x> mtu 9000 up

This setting is not saved across reboots. The setting change can be made permanent by adding MTU=9000 to the file: /etc/sysconfig/network-scripts/ifcfg-eth<x> (Red Hat distributions). Other distributions may store this setting in a different location.

NOTES:
  • To enable Jumbo Frames, increase the MTU size on the interface beyond 1500.

  • The maximum MTU setting for Jumbo Frames is 9216. This value coincides with the maximum Jumbo Frames size of 9234.

  • Using Jumbo frames at 10 or 100 Mbps is not supported and may result in poor performance or loss of link.

ethtool

The driver utilizes the ethtool interface for driver configuration and diagnostics, as well as displaying statistical information. ethtool version 3 or later is required for this functionality, although we strongly recommend downloading the latest version at:

http://ftp.kernel.org/pub/software/network/ethtool/.

Speed and Duplex Configuration

In addressing speed and duplex configuration issues, you need to distinguish between copper-based adapters and fiber-based adapters.

In the default mode, an Intel® Network Adapter using copper connections will attempt to auto-negotiate with its link partner to determine the best setting. If the adapter cannot establish link with the link partner using auto-negotiation, you may need to manually configure the adapter and link partner to identical settings to establish link and pass packets. This should only be needed when attempting to link with an older switch that does not support auto-negotiation or one that has been forced to a specific speed or duplex mode. Your link partner must match the setting you choose.

Speed and Duplex are configured through the ethtool* utility. ethtool is included with all versions of Red Hat after Red Hat 6.2. For other Linux distributions, download and install ethtool from the following website: http://ftp.kernel.org/pub/software/network/ethtool/.

CAUTION: Only experienced network administrators should force speed and duplex manually. The settings at the switch must always match the adapter settings. Adapter performance may suffer or your adapter may not operate if you configure the adapter differently from your switch.

An Intel® Network Adapter using fiber-based connections, however, will not attempt auto-negotiate with its link partner since those adapters operate only in full duplex, and only at their native speed.

Enabling Wake on LAN* (WoL)

WoL is configured through the ethtool* utility. ethtool is included with all versions of Red Hat after Red Hat 7.2. For other Linux distributions, download and install ethtool from the following website: http://ftp.kernel.org/pub/software/network/ethtool/.

For instructions on enabling WoL with ethtool, refer to the website listed above.

WoL will be enabled on the system during the next shut down or reboot. For this driver version, in order to enable WoL, the driver must be loaded prior to shutting down or suspending the system.

NOTES: Wake On LAN is only supported on port A of multi-port devices.

Wake On LAN is not supported for the Intel® Gigabit VT Quad Port Server Adapter.

Multiqueue

In this mode, a separate MSI-X vector is allocated for each queue and one for "other" interrupts such as link status change and errors. All interrupts are throttled via interrupt moderation. Interrupt moderation must be used to avoid interrupt storms while the driver is processing one interrupt. The moderation value should be at least as large as the expected time for the driver to process an interrupt. Multiqueue is off by default.

Requirements: MSI-X support is required for Multiqueue. If MSI-X is not found, the system will fallback to MSI or to Legacy interrupts. This driver supports multiqueue in kernel versions 2.6.24 and greater. This driver supports receive multiqueue on all kernels that support MSI-X.

NOTES: Do not use MSI-X with the 2.6.19 or 2.6.20 kernels.

On some kernels a reboot is required to switch between a single queue mode and multiqueue modes, or vice-versa.

LRO

Large Receive Offload (LRO) is a technique for increasing inbound throughput of high-bandwidth network connections by reducing CPU overhead. It works by aggregating multiple incoming packets from a single stream into a larger buffer before they are passed higher up the networking stack, thus reducing the number of packets that have to be processed. LRO combines multiple Ethernet frames into a single receive in the stack, thereby potentially decreasing CPU utilization for receives.

NOTE: LRO requires 2.4.22 or later kernel version.

IGB_LRO is a compile time flag. The user can enable it at compile time to add support for LRO from the driver. The flag is used by adding CFLAGS_EXTRA="-DIGB_LRO" to the make file when it's being compiled.

# make CFLAGS_EXTRA="-DIGB_LRO" install


You can verify that the driver is using LRO by looking at these counters in ethtool:

lro_aggregated - count of total packets that were combined
lro_flushed - counts the number of packets flushed out of LRO
lro_recycled - reflects the number of buffers returned to the ring from recycling

NOTE: IPv6 and UDP are not supported by LRO.

IEEE 1588 Precision Time Protocol (PTP) Hardware Clock (PHC)

Precision Time Protocol (PTP) is an implementation of the IEEE 1588
specification allowing network cards to synchronize their clocks over a
PTP-enabled network. It works through a series of synchronization and delay
notification transactions that allow a software daemon to implement a PID
controller to synchronize the network card clocks.

NOTE: PTP requires 3.0.0 or later kernel version that has PTP support
enabled in the kernel and a user-space software daemon.

IGB_PTP is a compile time flag. The user can enable it at compile time to add
support for PTP from the driver. The flag is used by adding
CFLAGS_EXTRA="-DIGB_PTP" to the make file when it's being compiled:

make CFLAGS_EXTRA="-DIGB_PTP" install

NOTE: The driver will fail to compile if your kernel does not
support PTP.

You can verify that the driver is using PTP by looking at the system log to
see whether a PHC was attempted to be registered or not. If you have a kernel
and version of ethtool with PTP support, you can check the PTP support in the
driver by executing:

ethtool -T ethX

MAC and VLAN anti-spoofing feature

When a malicious driver attempts to send a spoofed packet, it is dropped by the hardware and not transmitted. An interrupt is sent to the PF driver notifying it of the spoof attempt.
When a spoofed packet is detected the PF driver will send the following message to the system log (displayed by the "dmesg" command):

Spoof event(s) detected on VF(n)
Where n = the VF that attempted to do the spoofing.

Setting MAC Address, VLAN and Rate Limit Using IProute2 Tool

You can set a MAC address of a Virtual Function (VF), a default VLAN and the rate limit using the IProute2 tool. Download the latest version of the iproute2 tool from Sourceforge if your version does not have all the features you require.

Known Issues

NOTE: After installing the driver, if your Intel Ethernet Network Connection is not working, verify that you have installed the correct driver.

Using the igb driver on 2.4 or older 2.6 based kernels

Due to limited support for PCI-Express in 2.4 kernels and older 2.6 kernels, the igb driver may run into interrupt related problems on some systems, such as no link or hang when bringing up the device.

We recommend the newer 2.6 based kernels, as these kernels correctly configure the PCI-Express configuration space of the adapter and all intervening bridges. If you are required to use a 2.4 kernel, use a 2.4 kernel newer than 2.4.30. For 2.6 kernels we recommend using the 2.6.21 kernel or newer.

Alternatively, on 2.6 kernels you may disable MSI support in the kernel by booting with the "pci=nomsi" option or permanently disable MSI support in your kernel by configuring your kernel with CONFIG_PCI_MSI unset.

Intel® Active Management Technology 2.0, 2.1, 2.5 not supported in conjunction with Linux driver

Detected Tx Unit Hang in Quad Port Adapters

In some cases ports 3 and 4 don't pass traffic and report 'Detected Tx Unit Hang' followed by 'NETDEV WATCHDOG: ethX: transmit timed out' errors. Ports 1 and 2 don't show any errors and will pass traffic.

This issue MAY be resolved by updating to the latest kernel and BIOS. The user is encouraged to run an OS that fully supports MSI interrupts. You can check your system's BIOS by downloading the Linux Firmware Developer Kit that can be obtained at http://www.linuxfirmwarekit.org/

Compiling the Driver

When trying to compile the driver by running make install, the following error may occur:  "Linux kernel source not configured - missing version.h"

To solve this issue, create the version.h file by going to the Linux source tree and entering:

# make include/linux/version.h

Performance Degradation with Jumbo Frames

Degradation in throughput performance may be observed in some Jumbo frames environments. If this is observed, increasing the application's socket buffer size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help. See the specific application manual and /usr/src/linux*/Documentation/networking/ip-sysctl.txt for more details.

Jumbo frames on Foundry BigIron 8000 switch

There is a known issue using Jumbo frames when connected to a Foundry BigIron 8000 switch. This is a 3rd party limitation. If you experience loss of packets, lower the MTU size.

Multiple Interfaces on Same Ethernet Broadcast Network

Due to the default ARP behavior on Linux, it is not possible to have one system on two IP networks in the same Ethernet broadcast domain (non-partitioned switch) behave as expected. All Ethernet interfaces will respond to IP traffic for any IP address assigned to the system. This results in unbalanced receive traffic.

If you have multiple interfaces in a server, either turn on ARP filtering by entering:

        echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter

(this only works if your kernel's version is higher than 2.4.5)

NOTE: This setting is not saved across reboots. The configuration change can be made permanent by adding the line:

net.ipv4.conf.all.arp_filter = 1

to the file /etc/sysctl.conf

   or,

install the interfaces in separate broadcast domains (either in different switches or in a switch partitioned to VLANs).

Disable rx flow control with ethtool

In order to disable receive flow control using ethtool, you must turn off auto-negotiation on the same command line.

For example:

     ethtool -A eth? autoneg off rx off

Unplugging network cable while ethtool -p is running

In kernel versions 2.5.50 and later (including 2.6 kernel), unplugging the network cable while ethtool -p is running will cause the system to become unresponsive to keyboard commands, except for control-alt-delete. Restarting the system appears to be the only remedy.

Trouble passing traffic with on ports 1 and 2 using RHEL3

There is a known hardware compatibility issue on some systems with RHEL3 kernels. Traffic on ports 1 and 2 may be slower than expected and ping times higher than expected.

This issue MAY be resolved by updating to the latest kernel and BIOS. You can check your system's BIOS by downloading the Linux Firmware Developer Kit that can be obtained at http://www.linuxfirmwarekit.org/

Do Not Use LRO When Routing Packets

Due to a known general compatibilty issue with LRO and routing, do not use LRO when routing packets.

Build error with Asianux 3.0 - redefinition of typedef 'irq_handler_t'

Some systems may experience build issues due to redefinition of irq_handler_t. To resolve this issue build the driver (step 4 above) using the command:

# make CFLAGS_EXTRA=-DAX_RELEASE_CODE=1 install

MSI-X Issues with Kernels between 2.6.19 - 2.6.21 (inclusive)

Kernel panics and instability may be observed on any MSI-X hardware if you use irqbalance with kernels between 2.6.19 and 2.6.21. If such problems are encountered, you may disable the irqbalance daemon or upgrade to a newer kernel.

Rx Page Allocation Errors

Page allocation failure. order:0 errors may occur under stress with kernels 2.6.25 and above. This is caused by the way the Linux kernel reports this stressed condition.

Under Redhat 5.4-GA - System May Crash when Closing Guest OS Window after Loading/Unloading Physical Function (PF) Driver

Do not remove the igb driver from Dom0 while Virtual Functions (VFs) are assigned to guests. VFs must first use the xm "pci-detach" command to hot-plug the VF device out of the VM it is assigned to or else shut down the
VM.

SLES10 SP3 random system panic when reloading driver

This is a known SLES-10 SP3 issue. After requesting interrupts for MSI-X vectors, system may panic.

Currently the only known workaround is to build the drivers with CFLAGS_EXTRA=-DDISABLE_PCI_MSI if the driver need to be loaded/unloaded. Otherwise the driver can be loaded once and will be safe, but unloading it will lead to the issue.

Enabling SR-IOV in a 32-bit Microsoft* Windows* Server 2008 Guest OS using Intel® 82576-based GbE or Intel® 82599-based 10GbE controller under KVM

KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. This includes traditional PCIe devices, as well as SR-IOV-capable devices using Intel 82576-based and 82599-based controllers.

While direct assignment of a PCIe device or an SR-IOV Virtual Function (VF) to a Linux-based VM running 2.6.32 or later kernel works fine, there is a known issue with Microsoft Windows Server 2008 VM that results in a "yellow bang" error. This problem is within the KVM VMM itself, not the Intel driver, or the SR-IOV logic of the VMM, but rather that KVM emulates an older CPU model for the guests, and this older CPU model does not support MSI-X interrupts, which is a requirement for Intel SR-IOV.

If you wish to use the Intel 82576 or 82599-based controllers in SR-IOV mode with KVM and a Microsoft Windows Server 2008 guest try the following workaround. The workaround is to tell KVM to emulate a different model of CPU when using qemu to create the KVM guest:

"-cpu qemu64,model=13"

Host May Reboot after Removing PF when VF is Active in Guest

Using kernel versions earlier than 3.2, do not unload the PF driver with active VFs. Doing this will cause your VFs to stop working until you reload the PF driver and may cause a spontaneous reboot of your system.


Using the e1000e Base Driver

Overview

Building and Installation

Command Line Parameters

Speed and Duplex Configuration

Additional Configurations

Known Issues

Overview

The Linux base drivers support the 2.4.x, 2.6.x, and 3.x kernels. These drivers includes support for Itanium® 2-based systems.

These drivers are only supported as a loadable module. Intel is not supplying patches against the kernel source to allow for static linking of the drivers. For questions related to hardware requirements, refer to the documentation supplied with your Intel Gigabit adapter. All hardware requirements listed apply to use with Linux.

The following features are now available in supported kernels:

Channel Bonding documentation can be found in the Linux kernel source: /documentation/networking/bonding.txt

The driver information previously displayed in the /proc file system is not supported in this release. Alternatively, you can use ethtool (version 1.6 or later), lspci, and ifconfig to obtain the same information. Instructions on updating ethtool can be found in the section Additional Configurations later in this document.

NOTE: The Intel® 82562v 10/100 Network Connection only provides 10/100 support.

Building and Installation

To build a binary RPM* package of this driver, run 'rpmbuild -tb e1000e.tar.gz'.

NOTES:
  • For the build to work properly, the currently running kernel MUST match the version and configuration of the installed kernel sources. If you have just recompiled the kernel reboot the system now.

  • RPM functionality has only been tested in Red Hat distributions.

  1. Move the base driver tar file to the directory of your choice. For example, use '/home/username/e1000e' or '/usr/local/src/e1000e'.

  2. Untar/unzip the archive, where <x.x.x> is the version number for the driver tar file:

    tar zxf e1000e-<x.x.x>.tar.gz

  3. Change to the driver src directory, where <x.x.x> is the version number for the driver tar:

    cd e1000e-<x.x.x>/src/

  4. Compile the driver module:

    # make install

    The binary will be installed as:

    /lib/modules/<KERNEL VERSION>/kernel/drivers/net/e1000e/e1000e.[k]o

    The install location listed above is the default location. This may differ for various Linux distributions.

  5. Load the module using the modprobe command:

    modprobe e1000e

    With 2.6 based kernels also make sure that older e1000e drivers are removed from the kernel, before loading the new module:

    rmmod e1000e; modprobe e1000e

  6. Assign an IP address to the interface by entering the following, where <x> is the interface number:

    ifconfig eth<x> <IP_address>

  7. Verify that the interface works. Enter the following, where <IP_address> is the IP address for another machine on the same subnet as the interface that is being tested:

    ping <IP_address>

TROUBLESHOOTING: Some systems have trouble supporting MSI and/or MSI-X interrupts. If you believe your system needs to disable this style of interrupt, the driver can be built and installed with the command:

# make CFLAGS_EXTRA=-DDISABLE_PCI_MSI install

Normally the driver will generate an interrupt every two seconds, so if you can see that you're no longer getting interrupts in cat /proc/interrupts for the ethX e1000e device, then this workaround may be necessary.


Command Line Parameters

If the driver is built as a module, the following optional parameters are used by entering them on the command line with the modprobe command using this syntax:

modprobe e1000e [<option>=<VAL1>,<VAL2>,...]

There needs to be a <VAL#> for each network port in the system supported by this driver. The values will be applied to each instance, in function order. For example:

modprobe e1000e InterruptThrottleRate=16000,16000

In this case, there are two network ports supported by e1000e in the system. The default value for each parameter is generally the recommended setting, unless otherwise noted.

NOTES:
  • For more information about the InterruptThrottleRate, RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay parameters, see the application note at: http://www.intel.com/design/network/applnots/ap450.htm.

  • A descriptor describes a data buffer and attributes related to the data buffer. This information is accessed by the hardware.

Parameter Name Valid Range/Settings Default Description
InterruptThrottleRate
0,1,3,4, 100-100000 (0=off, 1=dynamic, 3=dynamic conservative, 4-simplified balancing)
 
3 The driver can limit the amount of interrupts per second that the adapter will generate for incoming packets. It does this by writing a value to the adapter that is based on the maximum amount of interrupts that the adapter will generate per second.

Setting InterruptThrottleRate to a value greater or equal to 100 will program the adapter to send out a maximum of that many interrupts per second, even if more packets have come in. This reduces interrupt load on the system and can lower CPU utilization under heavy load, but will increase latency as packets are not processed as quickly.

The default behavior of the driver previously assumed a static InterruptThrottleRate value of 8000, providing a good fallback value for all traffic types, but lacking in small packet performance and latency.

The driver has two adaptive modes (setting 1 or 3) in which it dynamically adjusts the InterruptThrottleRate value based on the traffic that it receives. After determining the type of incoming traffic in the last timeframe, it will adjust the InterruptThrottleRate to an appropriate value for that traffic.

The algorithm classifies the incoming traffic every interval into classes. Once the class is determined, the InterruptThrottleRate value is adjusted to suit that traffic type the best. There are three classes defined: "Bulk traffic", for large amounts of packets of normal size; "Low latency", for small amounts of traffic and/or a significant percentage of small packets; and "Lowest latency", for almost completely small packets or minimal traffic.

In dynamic conservative mode, the InterruptThrottleRate value is set to 4000 for traffic that falls in class "Bulk traffic". If traffic falls in the "Low latency" or "Lowest latency" class, the InterruptThrottleRate is increased stepwise to 20000. This default mode is suitable for most applications.

For situations where low latency is vital such as cluster or grid computing, the algorithm can reduce latency even more when InterruptThrottleRate is set to mode 1. In this mode, which operates the same as mode 3, the InterruptThrottleRate will be increased stepwise to 70000 for traffic in class "Lowest latency".

In simplified mode the interrupt rate is based on the ratio of tx and rx traffic. If the bytes per second rate is approximately equal, the interrupt rate will drop as low as 2000 interrupts per second. If the traffic is mostly transmit or mostly receive, the interrupt rate could be as high as 8000.

Setting InterruptThrottleRate to 0 turns off any interrupt moderation and may improve small packet latency, but is generally not suitable for bulk throughput traffic

NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and RxAbsIntDelay parameters. In other words, minimizing the receive and/or transmit absolute delays does not force the controller to generate more interrupts than what the Interrupt Throttle Rate allows.

NOTE: When e1000e is loaded with default settings and multiple adapters are in use simultaneously, the CPU utilization may increase non-linearly. In order to limit the CPU utilization without impacting the overall throughput, we recommend that you load the driver as follows:

modprobe e1000e InterruptThrottleRate=3000,3000,3000

This sets the InterruptThrottleRate to 3000 interrupts/sec for the first, second, and third instances of the driver. The range of 2000 to 3000 interrupts per second works on a majority of systems and is a good starting point, but the optimal value will be platform-specific. If CPU utilization is not a concern, use RX_POLLING (NAPI) and default driver settings.

RxIntDelay 0-65535 (0=off) 0 This value delays the generation of receive interrupts in units of 1.024 microseconds. Receive interrupt reduction can improve CPU efficiency if properly tuned for specific network traffic. Increasing this value adds extra latency to frame reception and can end up decreasing the throughput of TCP traffic. If the system is reporting dropped receives, this value may be set too high, causing the driver to run out of available receive descriptors.

CAUTION: When setting RxIntDelay to a value other than 0, adapters may hang (stop transmitting) under certain network conditions. If this occurs a NETDEV WATCHDOG message is logged in the system event log. In addition, the controller is automatically reset, restoring the network connection. To eliminate the potential for the hang ensure that RxIntDelay is set to zero.

RxAbsIntDelay 0-65535 (0=off) 8 This value, in units of 1.024 microseconds, limits the delay in which a receive interrupt is generated. Useful only if RxIntDelay is non-zero, this value ensures that an interrupt is generated after the initial packet is received within the set amount of time. Proper tuning, along with RxIntDelay, may improve traffic throughput in specific network conditions.
TxIntDelay 0-65535 (0=off) 8 This value delays the generation of transmit interrupts in units of 1.024 microseconds. Transmit interrupt reduction can improve CPU efficiency if properly tuned for specific network traffic. If the system is reporting dropped transmits, this value may be set too high causing the driver to run out of available transmit descriptors.
TxAbsIntDelay 0-65535 (0=off) 32 This value, in units of 1.024 microseconds, limits the delay in which a transmit interrupt is generated. Useful only if TxIntDelay is non-zero, this value ensures that an interrupt is generated after the initial packet is sent on the wire within the set amount of time. Proper tuning, along with TxIntDelay, may improve traffic throughput in specific network conditions.
copybreak 0-xxxxxxx (0=off) 256 Usage: modprobe e1000e.ko copybreak=128

Driver copies all packets below or equaling this size to a fresh rx buffer before handing it up the stack.

This parameter is different than other parameters, in that it is a single (not 1,1,1 etc.) parameter applied to all driver instances and it is also available during runtime at /sys/module/e1000e/parameters/copybreak
SmartPowerDownEnable 0-1
 
0 (disabled) Allows Phy to turn off in lower power states. The user can turn off this parameter in supported chipsets.
KumeranLockLoss 0-1 1 (enabled) This workaround skips resetting the Phy at shutdown for the initial silicon releases of ICH8 systems.
IntMode 0-2

(0=legacy, 1=MSI, 2=MSI-X)

2 (MSI-X)

 

Allows changing the interrupt mode at module load time, without requiring a recompile. If the driver load fails to enable a specific interrupt mode, the driver will try other interrupt modes, from least to most compatible. The interrupt order is MSI-X, MSI, Legacy. If specifying MSI (IntMode=1) interrupts, only MSI and Legacy will be attempted.
CrcStripping 0-1 1 (enabled) Strip the CRC from received packets before sending up the network stack. If you have a machine with a BMC enabled but cannot receive IPMI traffic after loading or enabling the driver, try disabling this feature.
EEE 0-1 1 (enabled for parts supporting EEE) This option allows for the ability of IEEE802.3az (a.k.a. Energy Efficient Ethernet or EEE) to be advertised to the link partner on parts supporting EEE.  EEE saves energy by putting the device into a low-power state when the link is idle, but only when the link partner also supports EEE and after the feature has been enabled during link negotiation.  It is not necessary to disable the advertisement of EEE when connected with a link partner that does not support EEE.
Node 0-n

0 - n: where n is the number of the NUMA node that should be used to allocate memory for this adapter port.

-1: uses the driver default of allocating memory on whichever processor is running modprobe.

-1 (off) The Node parameter will allow you to pick which NUMA node you want to have   the adapter allocate memory from.  All driver structures, in-memory queues, and receive buffers will be allocated on the node specified.  This parameter is only useful when interrupt affinity is specified, otherwise some portion of the time the interrupt could run on a different core than the memory is allocated on, causing slower memory access and impacting throughput, CPU, or both. 


Additional Configurations

Configuring the Driver on Different Distributions

Configuring a network driver to load properly when the system is started is distribution dependent. Typically, the configuration process involves adding an alias line to /etc/modules.conf or /etc/modprobe.conf as well as editing other system startup scripts and/or configuration files. Many popular Linux distributions ship with tools to make these changes for you. To learn the proper way to configure a network device for your system, refer to your distribution documentation. If during this process you are asked for the driver or module name, the name for the Linux Base Driver for the Gigabit family of adapters is e1000e.

As an example, if you install the e1000e driver for two Gigabit adapters (eth0 and eth1) and want to set the interrupt mode to MSI-X and MSI respectively, add the following to modules.conf or /etc/modprobe.conf:
alias eth0 e1000e
alias eth1 e1000e
options e1000e IntMode=2,1

Viewing Link Messages

Link messages will not be displayed to the console if the distribution is restricting system messages. In order to see network driver link messages on your console, set dmesg to eight by entering the following:

dmesg -n 8
NOTE: This setting is not saved across reboots.

Jumbo Frames

Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) to a value larger than the default value of 1500. Use the ifconfig command to increase the MTU size. For example:

ifconfig eth<x> mtu 9000 up

This setting is not saved across reboots. The setting change can be made permanent by adding MTU=9000 to the file: /etc/sysconfig/network-scripts/ifcfg-eth<x> (Red Hat distributions). Other distributions may store this setting in a different location.

NOTES:
  • To enable Jumbo Frames, increase the MTU size on the interface beyond 1500.

  • The maximum MTU setting for Jumbo Frames is 9216. This value coincides with the maximum Jumbo Frames size of 9234 bytes.

  • Using Jumbo frames at 10 or 100 Mbps is not supported and may result in poor performance or loss of link.

  • The following adapters limit Jumbo Frames sized packets to a maximum of 4088 bytes:
    Intel® 82578DM Gigabit Network Connection
    Intel® 82577LM Gigabit Network Connection

  • The following adapters do not support Jumbo Frames:
    Intel® PRO/1000 Gigabit Server Adapter
    Intel® PRO/1000 PM Network Connection
    Intel® 82562V 10/100 Network Connection
    Intel® 82566DM Gigabit Network Connection
    Intel® 82566DC Gigabit Network Connection
    Intel® 82566MM Gigabit Network Connection
    Intel® 82566MC Gigabit Network Connection
    Intel® 82562GT 10/100 Network Connection
    Intel® 82562G 10/100 Network Connection
    Intel® 82566DC-2 Gigabit Network Connection
    Intel® 82562V-2 10/100 Network Connection
    Intel® 82562G-2 10/100 Network Connection
    Intel® 82562GT-2 10/100 Network Connection
    Intel® 82578DC Gigabit Network Connection
    Intel® 82577LC Gigabit Network Connection
    Intel® 82567V-3 Gigabit Network Connection

  • Jumbo Frames cannot be configured on an 82579-based Network device, if MACSec is enabled on the system.

ethtool

The driver utilizes the ethtool interface for driver configuration and diagnostics, as well as displaying statistical information. ethtool version 3 or later is required for this functionality, although we strongly recommend downloading the latest version at:

http://ftp.kernel.org/pub/software/network/ethtool/.

NOTE: When validating enable/disable tests on some parts (82578, for example) you need to add a few seconds between tests when working with ethtool.

Speed and Duplex Configuration

Speed and Duplex are configured through the ethtool* utility. ethtool is included with all versions of Red Hat after Red Hat 7.2. For other Linux distributions, download and install ethtool from the following website: http://ftp.kernel.org/pub/software/network/ethtool/.

Enabling Wake on LAN* (WoL)

WoL is configured through the ethtool* utility. ethtool is included with all versions of Red Hat after Red Hat 7.2. For other Linux distributions, download and install ethtool from the following website: http://ftp.kernel.org/pub/software/network/ethtool/.

For instructions on enabling WoL with ethtool, refer to the website listed above.

WoL will be enabled on the system during the next shut down or reboot. For this driver version, in order to enable WoL, the e1000e driver must be loaded prior to shutting down or suspending the system.

NOTES: Wake On LAN is only supported on port A for the following devices:
  • Intel® PRO/1000 PT Dual Port Network Connection
  • Intel® PRO/1000 PT Dual Port Server Connection
  • Intel® PRO/1000 PT Dual Port Server Adapter
  • Intel® PRO/1000 PF Dual Port Server Adapter
  • Intel® PRO/1000 PT Quad Port Server Adapter
  • Intel® Gigabit PT Quad Port Server ExpressModule

NAPI

NAPI (Rx polling mode) is supported in the e1000e driver. NAPI is enabled by default.

To disable NAPI, compile the driver module, passing in a configuration option:

# make CFLAGS_EXTRA=-DE1000E_NO_NAPI install

For more information on NAPI, see ftp://robur.slu.se/pub/Linux/net-development/NAPI/usenix-paper.tgz.


Known Issues

NOTE: After installing the driver, if your Intel Network Connection is not working, verify that you have installed the correct driver.

Intel® Active Management Technology 2.0, 2.1, 2.5 not supported in conjunction with Linux driver

Detected Tx Unit Hang in Quad Port Adapters

In some cases ports 3 and 4 don't pass traffic and report 'Detected Tx Unit Hang' followed by 'NETDEV WATCHDOG: ethX: transmit timed out' errors. Ports 1 and 2 don't show any errors and will pass traffic.

This issue MAY be resolved by updating to the latest kernel and BIOS. The user is encouraged to run an OS that fully supports MSI interrupts. You can check your system's BIOS by downloading the Linux Firmware Developer Kit that can be obtained at http://www.linuxfirmwarekit.org/

Adapters with 4 ports behind a PCIe bridge

Adapters that have 4 ports behind a PCIe bridge may be incompatible with some systems. The user should run the Linux firmware kit from
http://www.linuxfirmwarekit.org/ to test their BIOS, if they have interrupt or "missing interface" problems, especially with older kernels.

82573(V/L/E) TX Unit Hang Messages

Several adapters with the 82573 chipset display "TX unit hang" messages during normal operation with the e1000e driver. The issue appears both with TSO enabled and disabled, and is caused by a power management function that is enabled in the EEPROM. Early releases of the chipsets to vendors had the EEPROM bit that enabled the feature. After the issue was discovered newer adapters were released with the feature disabled in the EEPROM.

If you encounter the problem in an adapter, and the chipset is an 82573-based one, you can verify that your adapter needs the fix by using ethtool:

 # ethtool -e eth0
 Offset          Values
 ------          ------
 0x0000          00 12 34 56 fe dc 30 0d 46 f7 f4 00 ff ff ff ff
 0x0010          ff ff ff ff 6b 02 8c 10 d9 15 8c 10 86 80 de 83
                                                           ^^

The value at offset 0x001e (de) has bit 0 unset. This enables the problematic power saving feature. In this case, the EEPROM needs to read "df" at offset 0x001e.

A one-time EEPROM fix is available as a shell script. This script will verify that the adapter is applicable to the fix and if the fix is needed or not. If the fix is required, it applies the change to the EEPROM and updates the checksum. The user must reboot the system after applying the fix if changes were made to the EEPROM.

Example output of the script:

 # bash fixeep-82573-dspd.sh eth0
 eth0: is a "82573E Gigabit Ethernet Controller"
 This fixup is applicable to your hardware
 executing command: ethtool -E eth0 magic 0x109a8086 offset 0x1e value 0xdf
 Change made. You *MUST* reboot your machine before changes take effect!

The script can be downloaded at http://e1000.sourceforge.net/files/fixeep-82573-dspd.sh

Dropped Receive Packets on Half-duplex 10/100 Networks

If you have an Intel PCI Express adapter running at 10mbps or 100mbps, half-duplex, you may observe occasional dropped receive packets. There are no workarounds for this problem in this network configuration. The network must be updated to operate in full-duplex, and/or 1000mbps only.

Compiling the Driver

When trying to compile the driver by running make install, the following error may occur:  "Linux kernel source not configured - missing version.h"

To solve this issue, create the version.h file by going to the Linux source tree and entering:

# make include/linux/version.h

Performance Degradation with Jumbo Frames

Degradation in throughput performance may be observed in some Jumbo frames environments. If this is observed, increasing the application's socket buffer size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help. See the specific application manual and /usr/src/linux*/Documentation/networking/ip-sysctl.txt for more details.

Jumbo frames on Foundry BigIron 8000 switch

There is a known issue using Jumbo frames when connected to a Foundry BigIron 8000 switch. This is a 3rd party limitation. If you experience loss of packets, lower the MTU size.

Allocating Rx Buffers when Using Jumbo Frames

Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if the available memory is heavily fragmented. This issue may be seen with PCI-X
adapters or with packet split disabled. This can be reduced or eliminated by changing the amount of available memory for receive buffer allocation, by increasing /proc/sys/vm/min_free_kbytes.

Multiple Interfaces on Same Ethernet Broadcast Network

Due to the default ARP behavior on Linux, it is not possible to have one system on two IP networks in the same Ethernet broadcast domain (non-partitioned switch) behave as expected. All Ethernet interfaces will respond to IP traffic for any IP address assigned to the system. This results in unbalanced receive traffic.

If you have multiple interfaces in a server, either turn on ARP filtering by entering:

        echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter

(this only works if your kernel's version is higher than 2.4.5)

NOTE: This setting is not saved across reboots. The configuration change can be made permanent by adding the line:

net.ipv4.conf.all.arp_filter = 1

to the file /etc/sysctl.conf

   or,

install the interfaces in separate broadcast domains (either in different switches or in a switch partitioned to VLANs).

Disable rx flow control with ethtool

In order to disable receive flow control using ethtool, you must turn off auto-negotiation on the same command line.

For example:

     ethtool -A eth? autoneg off rx off

Unplugging network cable while ethtool -p is running

In kernel versions 2.5.50 and later (including 2.6 kernel), unplugging the network cable while ethtool -p is running will cause the system to
become unresponsive to keyboard commands, except for control-alt-delete. Restarting the system appears to be the only remedy.

MSI-X Issues with Kernels between 2.6.19 - 2.6.21 (inclusive)

Kernel panics and instability may be observed on any MSI-X hardware if you use irqbalance with kernels between 2.6.19 and 2.6.21. If such problems are encountered, you may disable the irqbalance daemon or upgrade to a newer kernel.

Rx Page Allocation Errors

Page allocation failure. order:0 errors may occur under stress with kernels 2.6.25 and above. This is caused by the way the Linux kernel reports this stressed condition.

Network throughput degradation observed with Onboard video versus add-in Video Card on 82579LM Gigabit Network Connection when used with some older kernels.

This issue can be worked around by specifying "pci=nommconf" in the kernel boot parameter or by using another kernel boot parameter "memmap=128M$0x100000000" which marks 128 MB region at 4GB as reserved and therefore OS won't use these RAM pages.

This issue is fixed in kernel version 2.6.21, where the kernel tries to dynamically find out the mmconfig size by looking at the number of buses that the mmconfig segment maps to.

This issue won't be seen on 32bit version of EL5, as in that case, the kernel sees that RAM is located around the 256MB window and avoids using the mmconfig space.

Activity LED blinks unexpectedly

If a system based on the 82577, 82578, or 82579 controller is connected to a hub, the Activity LED will blink for all network traffic present on the hub. Connecting the system to a switch or router will filter out most traffic not addressed to the local port.

Link may take longer than expected

With some Phy and switch combinations, link can take longer than expected. This can be an issue on Linux distributions that timeout when checking for link prior to acquiring a DHCP address; however there is usually a way to work around this (e.g. set LINKDELAY in the interface configuration on RHEL).

Tx flow control is disabled by default on 82577 and 82578-based adapters

Possible performance degradation on certain 82566 and 82577 devices

Internal stress testing with jumbo frames shows the reliability on some 82566 and 82567 devices is improved in certain corner cases by disabling the Early Receive feature. Doing so can impact Tx performance. To reduce the impact, the packet buffer sizes and relevant flow control settings are modified accordingly.


Using the igbvf Base Driver

Overview

Building and Installation

Command Line Parameters

Additional Configurations

Known Issues

Overview

This driver supports upstream kernel versions 2.6.30 (or higher) x86_64.

Supported Operating Systems: SLES 11 SP1 x86_64, RHEL 5.3/5.4 x86_64.

The igbvf driver supports 82576-based virtual function devices that can only be activated on kernels that support SR-IOV. SR-IOV requires the correct platform and OS support.

The igbvf driver requires the igb driver, version 2.0 or later. The igbvf driver supports virtual functions generated by the igb driver with a max_vfs value of 1 or greater. For more information on the max_vfs parameter refer to the section on the the igb driver.

The guest OS loading the igbvf driver must support MSI-X interrupts.

This driver is only supported as a loadable module at this time. Intel is not supplying patches against the kernel source to allow for static linking of the driver. For questions related to hardware requirements, refer to the documentation supplied with your Intel Gigabit adapter. All hardware requirements listed apply to use with Linux.

Instructions on updating ethtool can be found in the section Additional Configurations later in this document.

VLANs: There is a limit of a total of 32 shared VLANs to 1 or more VFs.

Building and Installation

To build a binary RPM* package of this driver, run 'rpmbuild -tb <filename.tar.gz>'. Replace <filename.tar.gz> with the specific filename of the driver.

NOTE: For the build to work properly, the currently running kernel MUST match the version and configuration of the installed kernel sources. If you have just recompiled the kernel reboot the system now.

RPM functionality has only been tested in Red Hat distributions.

  1. Move the base driver tar file to the directory of your choice. For example, use ' /home/username/igbvf or /usr/local/src/igbvf.'.

  2. Untar/unzip the archive:

    tar zxf igbvf-x.x.x.tar.gz

  3. Change to the driver src directory:

    cd igbvf-<x.x.x>/src/

  4. Compile the driver module:

    # make install

    The binary will be installed as:

    /lib/modules/<KERNEL VERSION>/kernel/drivers/net/igbvf/igbvf.[k]o

    The install location listed above is the default location. This may differ for various Linux distributions.

  5. Load the module using the modprobe command:

    modprobe igbvf

    With 2.6 based kernels also make sure that older e1000e drivers are removed from the kernel, before loading the new module:

    rmmod igbvf; modprobe igbvf

  6. Assign an IP address to the interface by entering the following, where <x> is the interface number:

    ifconfig eth<x> <IP_address>

  7. Verify that the interface works. Enter the following, where <IP_address> is the IP address for another machine on the same subnet as the interface that is being tested:

    ping <IP_address>

Troubleshooting: Some systems have trouble supporting MSI and/or MSI-X interrupts. If you believe your system needs to disable this style of interrupt, the driver can be built and installed with the command:

make CFLAGS_EXTRA=-DDISABLE_PCI_MSI install

Normally the driver will generate an interrupt every two seconds, so if you can see that you're no longer getting interrupts in cat /proc/interrupts for the ethX igbvf device, then this workaround may be necessary.

Command Line Parameters

If the driver is built as a module, the following optional parameters are used by entering them on the command line with the modprobe command using this syntax:

modprobe igbvf [<option>=<VAL1>,<VAL2>,...]

There needs to be a <VAL#> for each network port in the system supported by this driver. The values will be applied to each instance, in function order. For example:

modprobe igbvf InterruptThrottleRate=16000,16000

In this case, there are two network ports supported by igbvf in the system. The default value for each parameter is generally the recommended setting, unless otherwise noted.

NOTES:
  • For more information about the InterruptThrottleRate parameter, see the application note at: http://www.intel.com/design/network/applnots/ap450.htm.

  • A descriptor describes a data buffer and attributes related to the data buffer. This information is accessed by the hardware.

Parameter Name Valid Range/Settings Default Description
InterruptThrottleRate
0,1,3,100-100000 (0=off, 1=dynamic, 3=dynamic conservative)
 
3 The driver can limit the amount of interrupts per second that the adapter will generate for incoming packets. It does this by writing a value to the adapter that is based on the maximum amount of interrupts that the adapter will generate per second.

Setting InterruptThrottleRate to a value greater or equal to 100 will program the adapter to send out a maximum of that many interrupts per second, even if more packets have come in. This reduces interrupt load on the system and can lower CPU utilization under heavy load, but will increase latency as packets are not processed as quickly.

The default behaviour of the driver previously assumed a static InterruptThrottleRate value of 8000, providing a good fallback value for all traffic types, but lacking in small packet performance and latency. The hardware can handle many more small packets per second however, and for this reason an adaptive interrupt moderation algorithm was implemented.

The driver has two adaptive modes (setting 1 or 3) in which it dynamically adjusts the InterruptThrottleRate value based on the traffic that it receives. After determining the type of incoming traffic in the last timeframe, it will adjust the InterruptThrottleRate to an appropriate value for that traffic.

The algorithm classifies the incoming traffic every interval into classes. Once the class is determined, the InterruptThrottleRate value is adjusted to suit that traffic type the best. There are three classes defined: "Bulk traffic", for large amounts of packets of normal size; "Low latency", for small amounts of traffic and/or a significant percentage of small packets; and "Lowest latency", for almost completely small packets or minimal traffic.

In dynamic conservative mode, the InterruptThrottleRate value is set to 4000 for traffic that falls in class "Bulk traffic". If traffic falls in the "Low latency" or "Lowest latency" class, the InterruptThrottleRate is increased stepwise to 20000. This default mode is suitable for most applications.

For situations where low latency is vital such as cluster or grid computing, the algorithm can reduce latency even more when InterruptThrottleRate is set to mode 1. In this mode, which operates the same as mode 3, the InterruptThrottleRate will be increased stepwise to 70000 for traffic in class "Lowest latency".

Setting InterruptThrottleRate to 0 turns off any interrupt moderation and may improve small packet latency, but is generally not suitable for bulk throughput traffic

NOTE: Dynamic interrupt throttling is only applicable to adapters operating in MSI or Legacy interrupt mode, using a single receive queue.

NOTE: When igbvf is loaded with default settings and multiple adapters are in use simultaneously, the CPU utilization may increase non-linearly. In order to limit the CPU utilization without impacting the overall throughput, we recommend that you load the driver as follows:

modprobe igbvf InterruptThrottleRate=3000,3000,3000

This sets the InterruptThrottleRate to 3000 interrupts/sec for the first, second, and third instances of the driver. The range of 2000 to 3000 interrupts per second works on a majority of systems and is a good starting point, but the optimal value will be platform-specific. If CPU utilization is not a concern, use default driver settings.

Additional Configurations

Configuring the Driver on Different Distributions

Configuring a network driver to load properly when the system is started is distribution dependent. Typically, the configuration process involves adding an alias line to /etc/modules.conf or /etc/modprobe.conf as well as editing other system startup scripts and/or configuration files. Many popular Linux distributions ship with tools to make these changes for you. To learn the proper way to configure a network device for your system, refer to your distribution documentation. If during this process you are asked for the driver or module name, the name for the Linux Base Driver for the Gigabit Family of Adapters is igbvf.

As an example, if you install the igbvf driver for two Gigabit adapters (eth0 and eth1) and want to set the interrupt mode to MSI-X and MSI respectively, add the following to modules.conf or /etc/modprobe.conf:

alias eth0 igbvf
alias eth1 igbvf
options igbvf InterruptThrottleRate=3,1

Viewing Link Messages

Link messages will not be displayed to the console if the distribution is restricting system messages. In order to see network driver link messages on your console, set dmesg to eight by entering the following:

dmesg -n 8
NOTE: This setting is not saved across reboots.

Jumbo Frames

Jumbo Frames support is enabled by changing the MTU to a value larger than the default of 1500. Use the ifconfig command to increase the MTU size.

For example:

ifconfig eth<x> mtu 9000 up

This setting is not saved across reboots. It can be made permanent if you add:

MTU=9000

to the file /etc/sysconfig/network-scripts/ifcfg-eth<x>. This example applies to the Red Hat distributions; other distributions may store this setting in a different location.

NOTES:
  • To enable Jumbo Frames, increase the MTU size on the interface beyond 1500.

  • The maximum MTU setting for Jumbo Frames is 9216. This value coincides with the maximum Jumbo Frames size of 9234 bytes.

  • Using Jumbo frames at 10 or 100 Mbps is not supported and may result in poor performance or loss of link.

ethtool

The driver utilizes the ethtool interface for driver configuration and diagnostics, as well as displaying statistical information. ethtool version 3.0 or later is required for this functionality, although we strongly recommend downloading the latest version at:

http://ftp.kernel.org/pub/software/network/ethtool/.

Known Issues/Troubleshooting

NOTE: After installing the driver, if your Intel Network Connection is not working, verify that you have installed the correct driver.

Driver Compilation

When trying to compile the driver by running make install, the following error may occur:

"Linux kernel source not configured - missing version.h"

To solve this issue, create the version.h file by going to the Linux source tree and entering:

make include/linux/version.h.

Multiple Interfaces on Same Ethernet Broadcast Network

Due to the default ARP behavior on Linux, it is not possible to have one system on two IP networks in the same Ethernet broadcast domain (non-partitioned switch) behave as expected. All Ethernet interfaces will respond to IP traffic for any IP address assigned to the system. This results in unbalanced receive traffic.

If you have multiple interfaces in a server, either turn on ARP filtering by entering:

echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
(this only works if your kernel's version is higher than 2.4.5),

NOTE: This setting is not saved across reboots. The configuration change can be made permanent by adding the line:

net.ipv4.conf.all.arp_filter = 1

to the file /etc/sysctl.conf or installing the interfaces in separate broadcast domains (either in different switches or in a switch partitioned to VLANs.

Do Not Use LRO When Routing Packets

Due to a known general compatibility issue with LRO and routing, do not use LRO when routing packets.

Build error with Asianux 3.0 - redefinition of typedef 'irq_handler_t'

Some systems may experience build issues due to redefinition of irq_handler_t. To resolve this issue build the driver (step 4 above) using the command:

make CFLAGS_EXTRA=-DAX_RELEASE_CODE=1 install

MSI-X Issues with Kernels between 2.6.19 - 2.6.21 (inclusive)

Kernel panics and instability may be observed on any MSI-X hardware if you use irqbalance with kernels between 2.6.19 and 2.6.21. If such problems are
encountered, you may disable the irqbalance daemon or upgrade to a newer kernel.

Rx Page Allocation Errors

Page allocation failure. order:0 errors may occur under stress with kernels 2.6.25 and above. This is caused by the way the Linux kernel reports this stressed condition.

Under Redhat 5.4 - System May Crash when Closing Guest OS Window after Loading/Unloading Physical Function (PF) Driver

Do not remove the igbvf driver from Dom0 while Virtual Functions (VFs) are assigned to guests. VFs must first use the xm "pci-detach" command to hot-plug the VF device out of the VM it is assigned to or else shut down the VM.

Unloading Physical Function (PF) Driver Causes System Reboots When VM is Running and VF is Loaded on the VM

Do not unload the PF driver (igb) while VFs are assigned to guests.

Host May Reboot after Removing PF when VF is Active in Guest

Using kernel versions earlier than 3.2, do not unload the PF driver with active VFs. Doing this will cause your VFs to stop working until you reload the PF driver and may cause a spontaneous reboot of your system.

 


Last modified on 11/03/11 4:12p Revision