The default FreeBSD configuration is optimized for compatibility, not maximum network throughput. This becomes visible especially during iperf testing, routing benchmarks, or high-traffic workloads where mbuf exhaustion or CPU bottlenecks can occur. Let's discuss various turnings.
Enable multiple network threads
FreeBSD defaults can be too conservative for multi-core systems.
net.isr.maxthreads=<number of CPU cores or NIC queues>
net.isr.bindthreads=1
net.isr.dispatch=direct
Recommendations:
- For 4-core system use maxthreads=4
- For 8-core system use maxthreads=8
- For 16-core system use maxthreads=16
dispatch=direct removes context switching and gives the best throughput (iperf2, DPDK-like behavior).
Add to /etc/sysctl.conf:
net.isr.maxthreads=4
net.isr.bindthreads=1
net.isr.dispatch=direct
Increase mbuf/cluster limits
Critical for high PPS or large packet workloads.
kern.ipc.nmbclusters=524288
kern.ipc.nmbjumbop=524288
kern.ipc.nmbjumbo9=131072
kern.ipc.nmbjumbo16=65536
Rules of thumb:
- For 10G network use 256k+ clusters
- For 25/40/100G use 512k+ clusters
- For VXLAN/Jumbo Frames, jumbop/jumbo9 tuning is required
Enable hardware offloading
net.inet.tcp.tso=1
net.inet.tcp.lro=1
net.inet.tcp.sack.enable=1
net.inet.ip.fastforwarding=1
Enable ip.fastforwarding ONLY if not using PF or IPFW (it bypasses the firewall path). LRO should not be used with security features as well.
Tune TCP stack for high throughput
Increase buffers ...
net.inet.tcp.recvspace=1048576
net.inet.tcp.sendspace=1048576
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.delayed_ack=0
For long-distance/high-bandwidth tests (iperf across DCs):
net.inet.tcp.cc.algorithm=cubic
Optimize NIC queueing
Most 10/40/100G NICs create n receive queues.
Check queues:
sysctl dev.<drer>.<unit>.rx_queues
sysctl dev.<driver>.<unit>.tx_queues
Bind queues to CPUs evenly:
sysctl net.isr.bindthreads=1
For Intel 82599/X520/X710/XL710:
hw.ix.max_queues=0 # auto
hw.ix.rxd=2048
hw.ix.txd=2048
For Mellanox CX3/CX4/CX5:
hw.mlx4.en.num_comp_vectors=16
hw.mlx5.eq_depth=2048
Disable power-saving features
NICs and CPUs must not downclock.
CPU
Add to /boot/loader.conf:
machdep.hwpstate_pkg_ctrl=0
Disable deep C-states:
sysctl dev.cpu.0.cx_lowest=C1
NIC power saving
On some drivers:
ifconfig <nic> powerdmgmt 0
PF or IPFW optimization (if used)
PF: increase state handling capacity
set limit states 2000000
set limit frags 2000000
set limit src-nodes 200000
Disable scrub (expensive)
set skip on <interfaces>
Disable debugging
set debug none
Increase mbuf headroom for PF
net.pf.source_nodes_hashsize=2048
sysctl.conf best practice bundle
Here is a recommended config for a modern 10/40G FreeBSD server:
### Multithreaded ISR
net.isr.maxthreads=16
net.isr.bindthreads=1
net.isr.dispatch=direct
### Network buffers
kern.ipc.nmbclusters=524288
kern.ipc.nmbjumbop=524288
kern.ipc.nmbjumbo9=131072
kern.ipc.nmbjumbo16=65536
### TCP tuning
net.inet.tcp.recvspace=1048576
net.inet.tcp.sendspace=1048576
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.delayed_ack=0
net.inet.tcp.cc.algorithm=cubic
### Optimizations
net.inet.tcp.tso=1
net.inet.tcp.lro=1
net.inet.tcp.sack.enable=1
net.inet.ip.fastforwarding=1 # only if no PF/IPFW
iperf2/iperf3 testing
Client/Server testing
Server: iperf -s
Client: iperf -c <server> -P 8 -w 2M
- Use multiple parallel streams (-P 8 to -P 32)
- Use -w large socket buffers (1M–16M)
If PPS is the bottleneck, than increase mbuf limits.
If CPU is the bottleneck, than tune ISR threads + offloads
Tuning FreeBSD Network Performance Inside a VMware Virtual Machine
Running FreeBSD as a virtual machine inside VMware ESXi is very common in labs, routing platforms, or network testing setups. However, the default FreeBSD configuration is optimized for compatibility, not maximum network throughput. This becomes visible especially during iperf testing, routing benchmarks, or high-traffic workloads where mbuf exhaustion or CPU bottlenecks can occur.
Hardware details of my VM
root@fbsd01:~ #sysctl hw.modelhw.model: Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz root@fbsd01:~ #sysctl hw.ncpuhw.ncpu: 4 root@fbsd01:~ #sysctl hw.physmemhw.physmem: 4250390528 root@fbsd01:~ #pciconf -lv | grep -A4 -i ethdevice = 'VMXNET3 Ethernet Controller' class = network subclass = ethernet root@fbsd01:~ #sysctl dev | grep -E '(ix|mlx|cxgb)'dev.vmx.0.iflib.allocated_msix_vectors: 5 dev.vmx.0.iflib.use_extra_msix_vectors: 0 dev.vmx.0.iflib.disable_msix: 0 root@fbsd01:~ #
Understanding the Virtual NIC: VMXNET3
VMware presents a synthetic network adapter called VMXNET3.
It supports:
- Multiple RX/TX queues
- MSI-X interrupts
- TSO/LRO offloading
- High throughput (10–40+ Gb/s in multi-queue mode)
On FreeBSD, VMXNET3 is handled by the vmx(4) driver, built on top of iflib.
Increase Mbuf Space (Fix nmbclusters limit reached)
Under heavy iperf or firewall traffic, you may see:
kern.ipc.nmbclusters limit reached
kern.ipc.nmbjumpop limit reached
This indicates the system exhausted the default packet memory pool.
Add to /etc/sysctl.conf:
kern.ipc.nmbclusters=262144
kern.ipc.nmbjumbop=131072
kern.ipc.nmbjumbo9=65536
kern.ipc.nmbjumbo16=16384
Or globally enlarge the mbuf zone:
kern.ipc.maxmbuf=2097152
Reboot.
Enable Multi-Core Packet Processing
By default, FreeBSD processes packets in a single netisr thread.
To spread packet load across all vCPUs:
net.isr.dispatch=deferred
net.isr.maxthreads=4
net.isr.bindthreads=1
This lets FreeBSD scale RX/TX across multiple VMXNET3 queues.
Increase TCP Buffers (Iperf Benefit)
For 10G+ throughput, enlarge socket buffers:
kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendspace=4194304
net.inet.tcp.recvspace=4194304
net.inet.tcp.delayed_ack=0
net.inet.tcp.cc.algorithm=cubic
This improves large window TCP performance.
Enable Offloading (TSO/LRO)
VMXNET3 supports good offloading. Enable it explicitly:
ifconfig vmx0 tso lro
Permanent setting in /etc/rc.conf:
ifconfig_vmx0="up tso lro"
Offloads reduce CPU cycles per packet and dramatically help iperf.
Optional: Increase VMXNET3 Ring Sizes
If you want deeper RX/TX descriptor queues:
Add to /boot/loader.conf:
hw.vmx.iflib.override_nrxds="2048,2048,2048,2048"
hw.vmx.iflib.override_ntxds="1024,1024,1024,1024"
Reboot and verify:
sysctl dev.vmx.0.iflib.override_nrxds
Optional VMware Host Settings
To maximize performance:
- VM Hardware
- Use VMXNET3
- Reserve full RAM for the VM
- Set Latency Sensitivity = High
- Enable Expose hardware-assisted virtualization
- Set CPU to high performance mode
ESXi Host
- Disable deep C-states (BIOS)
- Keep vCPUs in a single NUMA node when possible
Recommended iperf Commands
iperf3
iperf3 -s
iperf3 -c <host> -P 8 -w 4M
iperf2 (better scaling)
VMware usually reaches
- 7–10 Gbit/s (single NIC)
- 12–20 Gbit/s (multi-stream)
- 20–35 Gbit/s (multiple vNICs)
Summary of FreeBSD VMware VM Tuning Configuration
/etc/sysctl.conf
net.isr.dispatch=deferred
net.isr.maxthreads=4
net.isr.bindthreads=1
kern.ipc.nmbclusters=262144
kern.ipc.nmbjumbop=131072
kern.ipc.nmbjumbo9=65536
kern.ipc.nmbjumbo16=16384
kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendspace=4194304
net.inet.tcp.recvspace=4194304
net.inet.tcp.delayed_ack=0
net.inet.tcp.cc.algorithm=cubic
/etc/rc.conf
ifconfig_vmx0="up tso lro"
/boot/loader.conf
kern.hz=1000
machdep.hwpstate_pkg_ctrl=0
hw.pci.enable_msi=1
hw.pci.honor_msi_blacklist=1
# optional tuning
hw.vmx.iflib.override_nrxds="2048,2048,2048,2048"
hw.vmx.iflib.override_ntxds="1024,1024,1024,1024"
Conclusion
With the above tuning, FreeBSD becomes a very capable high-performance VM inside VMware ESXi. VMXNET3 combined with proper mbuf sizing, TCP buffer tuning, and multi-queue netisr scaling allows FreeBSD to achieve near–bare-metal throughput, especially in iperf testing scenarios.
If you're running firewalls, routers, monitoring systems, or storage services on FreeBSD under VMware, these settings will significantly improve performance and stability.
No comments:
Post a Comment