I’m writing this down in the hope that someone else trying to solve this issue via some random ‘Googling’ may find this article and that it may save them some time… compared to how long it took me to solve this!
I run a moderately sized UniFi based network consisting of a UDM-Pro, 10 UniFi switches on a fibre ring, and 19 UniFI Wireless LAN access points.
I turned on automatic software updates on the UDM-Pro a while ago, and I am presuming that the issue that has occurred here is as a result of the latest software that got auto-installed (UniFi UDM Pro Version 1.10.4 and ‘Network’ version 6.5.53).
It took me a few days to figure out that the UniFi network was the cause of broken printer operation and also the cause of our on-site Sonos audio devices all ceasing to work.
The way the ‘breakage’ occurs is subtle – some, but not all, peer to peer IP traffic over the wireless LAN was being silently dropped. Broken things included Multicast based protocols (such as AirPrint) and also initial connection establishment (with the ARP protocol’s broadcast frames apparently being filtered as well)
This impacted some network nodes but not to all of them.
The way this first manifested was that our printer stopped working. It didn’t work wireless-to-wireless, and it didn’t work when I tried cabling it into a switch and accessing it via wireless on my laptop.
The failure to work even on a wired connection convinced me the printer was just… b0rked, and that it was time to replace it.
In a similar timeframe, the Sonos audio devices in our house also stopped working. I didn’t initially register the timing coincidence with the printer fault because, frankly, Sonos network communication sometimes get flakey ‘all by itself’…and I had shelved that issue to figure out ‘later’.
In the meantime, we actually went and bought a new printer! When I fired that up, it failed to work as well (!). Ah hah… ‘lightbulb moment’. It wasn’t the printer. It was something far more fundamental about the network. It had to be.
I started digging deeper, with that new data point (that it wasn’t actually the printer at fault). Eventually, via various network diagnostic steps, I figured out that a strange network failure mode was the real issue.
Since it seemed to be impacting broadcast and multicast frames, I started looking for, and adjusting settings related to that functionality in the UDM-Pro.
Eventually I landed on the fix:
- Turn OFF the ‘AUTO OPTIMIZE NETWORK’ setting under the Settings->Site menu. This is necessary in order to unlock the ability to change the next setting
- Turn OFF the checkbox called “Block LAN to WLAN Multicast and Broadcast Traffic” on the Settings->Wireless Networks-> [ Your network name here ] -> EDIT WIRELESS NETWORK page
Ii is that latter setting that is doing the damage. When on, it is literally blocking things as fundamental as ARP broadcast packets so that IP connections can’t even be established reliably between hosts on the local network. Hosts can all talk to ‘the wider Internet’ just fine, but they can’t talk to each other.
I didn’t take any manual action to break the network – all I did was leave automatic OS updates turned on and a few days ago – these mysterious faults appeared.
See the screen shots below to help you find where those settings located on the UniFi Web console interface.
As soon as I took the steps above, the UDP-Pro re-provisioned all my UniFI WiFi access points and everything started working again – printer started working perfectly, Sonos started working fine. I proved this is the issue by turning the setting back on and – bingo – instant network faults. Off again…and it all works again.
I hope this helps someone else!
I’ve reported it to Ubiquiti because I think they need to urgently make a further update to fix this pretty fundamental bug, and hopefully they’ll indeed fix it.
In the meantime, I hope this post helps someone else avoid the head scratching (and/or throwing out perfectly good printers and/or Sonos units!).
12 Dec 2021: The ‘Network’ version has been incremented to 6.5.54 as at 12 December 2021.
Some good has been done in this incremental release: It has been kindly pointed out to me that a workaround is in there now for small sites, so at least small sites (less than 10 APs) won’t suffer this issue any longer.
The following comment has been added to the release notes for 6.5.54 (from here):
- Enable multicast block if Auto-optimize is enabled, and there are more than 10 APs assigned to SSID.
My site does have more than 10 APs assigned to the SSID concerned. So, for me, 6.5.54 still shows the issue (which explains why, in my testing, the fault wasn’t fixed in 6.5.54).
This is a good thing – sites with less than 10 APs will now no longer see this bug.
However: The bug still exists and this workaround is just hiding it until it leaps out and bites sites in the tail when they expand to 10 or more APs (on the same SSID). Argh.
This feature (when enabled) is breaking fundamental aspects of TCP/IP network operation on a routine basis.
There are two issues here.
The first is that it is quite possible for two hosts to be associated to two different APs while being in the same physical location. Picture a printer on a desk and a user of that printer who has walked in from the AP next door (and is still associated with the AP next door).
Or picture a location that happens to just be at the intersection of two roughly equidistant APs (which is going to happen all the time on a network with more than 10 APs)
When this happens, the outcome for users in terms of Multicast/Broadcast activity is going to become intermittent – sometimes it’ll block packets, and sometimes (if host hosts do happen to both be on the same AP) it might work…for a while. And then stop again mysteriously later.
This intermittency was evident in my initial testing (and now I appreciate why).
As people make their networks larger (and of course for anyone who already was running a large network and who has auto-updated), they will see this mysterious problem happen both without warning and without explanation.
I actually think that’s worse… because now its a fault that unexpectedly occurs when the network expands beyond a certain point. Pity the IT guy who has to figure that one out, with the sole clue being that one line in the Network 6.5.54 release notes.
The signature example of the seriousness of this problem is something completely fundamental to working TCP/IP networks:
This feature blocks ARP packets
As a result, the establishment of working unicast connections between hosts in the local network (e.g. as fundamental as connecting to a printer and using it) will not work reliably (or in many cases, will not work at all)… which is where we came in.
It also means, for instance, that if you try to ‘ping’ another host on your local IP range, that ping might work, or it might not, depending on where the other host is, across your network (or on whether it has roamed to another AP that is reachable in the same physical location).
Debugging that sort of thing could drive people a bit crazy.
Without consideration of this, the functionality of the feature in general is pretty broken.
I get the point, and in implementing this, Ubiquiti means well, but it has not been fully thought out and it is going to be the cause of nasty (and worse, subtle) network faults on a continuing basis until more effort is put in to how this feature works (and to allowing the operator to select Broadcast and MultiCast packet types to continue to forward!).
I’ve noticed that when turned on, this feature allows for the addition of hosts by MAC address that are still able to be visible network wide:
That is a tacit admission of how this feature breaks stuff.
Adding hosts to the exclude-from-blocking list by MAC address is well meaning, but network operators will be perpetually chasing their own tails as people add printers or audio devices (or replace busted ones). Maintaining a MAC address block list is just a ‘make work’ activity that no network administrator (or their users) needs. Not ever.
Ubiquiti has implemented extensive device ‘fingerprinting’ of devices over time. This meansthey can figure out what things are. If this feature is going to exist (and be silently turned on without warning!!) at all, then it has to be configurable in terms of device types and/or broadcast/multicast protocols that can be whitelisted, not hosts.
Again the issue here is that there are protocols (like ARP – argh) that you just can’t block between APs at all, without breaking fundamental aspects of how TCP/IP networks work.
This isn’t good, and until it is further improved, the underlying problem remains. The change in .54 does help a bit, for most people… but for the people it doesn’t help, it has made the real problem (that the feature itself is un-tenable as it stands) both harder to find (and hence harder to fix).