IP Layer

IPv4

Use ping <addr> -M do -s 8000 on Linux to prohibit fragmentation and conduct PMTUD (Path MTU Discovery).

$ ping 1.1 -M do -s 8000
PING 1.1 (1.0.0.1) 8000(8028) bytes of data.
From 140.91.232.7 icmp_seq=1 Frag needed and DF set (mtu = 1500)
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
...

On macOS, use ping <addr> -D -s 8000.

IPv6

On Linux the command is the same, just use ping -6 or ping6 to force IPv6.

On macOS 14, try sudo ping6 <addr> -s 8000 -Dm to prohibit fragmentation.

$ sudo ping6 he.net -s 8000 -Dm
PING6(8048=40+8+8000 bytes) xxxx:xxxx --> 2001:470:0:503::2
ping6: sendmsg: Message too long
ping6: wrote he.net 8008 chars, ret=-1

The need for PMTUD is more widespread in IPv6 environments because of enforced no fragmentation for IPv6, less NAT middle boxes that apply MSS clamping, and the reasons below.

But why did this problem not appear for IPv4 traffic? We believe the same issue exists on IPv4, but it’s less damaging due to the different nature of the network. IPv4 is more mature and the great majority of end-hosts support either MTU 1500 or have their MSS option well configured - or clamped by some middle box. This is different in IPv6 where a large proportion of users use tunnels, have Path MTU strictly smaller than 1500 and use incorrect MSS settings in the TCP header. Finally, Linux implements RFC4821 for IPv4 but not IPv6. RFC4821 (PLPMTUD) has its disadvantages, but does slightly help to alleviate the ICMP blackhole issue.

Firewall

Make sure to allow ICMPv6 “Packet Too Big” (PTB) messages for all IPv6 hosts you need to communicate with. Otherwise, PMTUD would not work and you can get hanging connections.

OpenWRT 6rd Tunnel MTU

Take 6rd for example, with a 1500 MTU upstream link, the 6rd interface has a default MTU of 1280 per RFC spec, but IPv4 still has 1500 bytes MTU.

If the MTU is well-managed such that the IPv4 MTU on the CE WAN side interface is set so that no fragmentation occurs within the boundary of the SP, then the 6rd Tunnel MTU should be set to the known IPv4 MTU minus the size of the encapsulating IPv4 header (20 bytes). For example, if the IPv4 MTU is known to be 1500 bytes, the 6rd Tunnel MTU might be set to 1480 bytes. Absent more specific information, the 6rd Tunnel MTU SHOULD default to 1280 bytes.

On Linux, even though ip link may show mtu 1500, MTU for IPv6 may differ. You can check the actual value at /proc/sys/net/ipv6/conf/<interface>/mtu. This values is automatically adjusted according to the Router Advertisement sent by the router, and OpenWRT takes care of it magically by inheriting the upstream MTU.

If for some reason a client sends packets bigger than 1280 bytes, the router will return ICMPv6 PTB messages to ask the client to resend the data in smaller datagrams.

Upper Layer

TCP

For TCP connections, PMTUD is handled automatically by the kernel or NIC.

UDP

If don’t-fragment flag is set on a UDP or raw IP socket, an EMSGSIZE error will be returned upon recognizing datagrams that are bigger than the known path MTU. See ip(4) on FreeBSD and ip(7) on Linux for respective socket options IP_PMTUDISC_DO and IP_DONTFRAG.

Note: An implementation can avoid the use of an asynchronous notification mechanism for PMTU decreases by postponing notification until the next attempt to send a packet larger than the PMTU estimate. In this approach, when an attempt is made to SEND a packet that is larger than the PMTU estimate, the SEND function should fail and return a suitable error indication. This approach may be more suitable to a connectionless packetization layer (such as one using UDP), which (in some implementations) may be hard to “notify” from the ICMPv6 layer. In this case, the normal timeout-based retransmission mechanisms would be used to recover from the dropped packets.

VPN

For Layer 3 VPNs (L3VPN) that need to transmit IPv6 packets, 1280 bytes plus packet encapsulation overhead is the minimum MTU required.

This causes fragmentation on a default OpenWrt 6rd setup, which has 1280 bytes of MTU. You can verify this with tcpdump -i eth0.X host <6rd_peeraddr> on the router. If ping6 he.net -s 1233 on the client triggers ipv6-frag in tcpdump output, that means a 1281-byte IPv6 packet needs to be fragmented to be sent over IPv4 to the 6rd peer.

Because of this, for best VPN performance, it is recommended to explicitly set the 6rd tunnel MTU to 1480 bytes if the IPv6 MTU is 1500 bytes, and then verify that packets are not fragmented over the wire with tcpdump.

References