Subtitle: “Ubiquiti Auto-Optimise Breaks stuff again” (it is also responsible for this).
I have been having persistent, annoying and sustained issues with older Sonos devices dropping off of my WiFi network after a while. This is on a network driven entirely with Ubiquiti UniFI products (switches and access points connected to a UDM-Pro).
The older Sonos products would disappear regularly from view from the Sonos Apps.
Power cycle them and they return, but then any time from a few minutes to a few hours later… they’re gone again. Power cycle yet again. Rinse and Repeat. Grrr.
The Ubiquiti ‘auto-optimize’ function strikes again – and breaks stuff. Hmmm.
The ‘optimize’ fuction modifies the minimum data rate for 2.4Ghz ‘beacon’ frames to a rate above the 802.11b default. Doing this breaks WiFi connectivity for devices that can only use 802.11b. The older Sonos models use 802.11b. So … this breaks them.
(1) Turn off ‘auto-optimize’ (under Advanced settings in the Network settings page).
(2) Turn off the checkbox “2G Data Rate Control” on the Wireless Network page for your WiFi SSID concerned (see images below) to restore working 802.11b WiFi connectivity
The Details (if you need or want them)
I have Sonos One devices and (older) Sonos Play:5 devices. Its the Play:5’s that kept going away.
Importantly the older Sonos units are 2.4Ghz 802.11b only units devices… they can’t talk on 5Ghz (but the UniFi AP’s can bridge them to control devices running on newer devices and newer bands).
After a lot of head scratching, I realised that ‘something’ had changed a critical setting on my UDM-Pro.
The setting that was changed was the one (in the Wireless Network page) called “2G data rate control”.
This had been turned on (by the ‘auto optimizer’) and set to 5.5Mb/s minimum:
This is a minimum beacon rate that (as it says on the page) causes “Limited range and limited connectivity for 802.11b devices”
The older Sonos devices are 802.11b 2.4Ghz only devices! Hence this setting is guaranteed to break Sonos Network access – and to keep breaking it. Argh.
So: The fix is to set that minimum data rate back down to 1Mb/s, at which point the on-screen text changes to “Full Device Compatibility and Range”:
It is simpler (and more direct) to just disable the rate changing function entirely (by un-checking the box ‘Enable Minimum data rate control’.
However, I wanted to point out the changes in that explanatory text with the screen images above. These underscore that the optimiser (which is on by default) really does make changes that break older WiFi device access (and without warning the user that this is what will happen).
Note that on the UDM (vs the UDM-Pro), there are different checkboxes to restore 802.11b connectivity. On that platform, you (again) turn off the optimiser and then under your WiFi network configuration page you’ll find a switch hiding in the “Advanced” section “Enable Legacy Support”. This switch is explained with the text ‘Enable legacy device support (i.e. 11b)’. Duh. So turn that on (i.e. do enable the legacy device support).
I’m sure this is breaking connectivity for other devices too – there are going to be lots of little gadgets in people’s networks that are older or simpler and that only support 802.11b. If you want those to work… you’ll want to enable that legacy support.
Update: We have some wall mounted underfloor heating controllers (with WiFi) in the house… that were unreliable and just couldn’t hold a WiFi connection. You know where this is going … now they work just fine.
I’m writing this down in the hope that someone else trying to solve this issue via some random ‘Googling’ may find this article and that it may save them some time… compared to how long it took me to solve this!
I run a moderately sized UniFi based network consisting of a UDM-Pro, 10 UniFi switches on a fibre ring, and 19 UniFI Wireless LAN access points.
I turned on automatic software updates on the UDM-Pro a while ago, and I am presuming that the issue that has occurred here is as a result of the latest software that got auto-installed (UniFi UDM Pro Version 1.10.4 and ‘Network’ version 6.5.53).
It took me a few days to figure out that the UniFi network was the cause of broken printer operation and also the cause of our on-site Sonos audio devices all ceasing to work.
The way the ‘breakage’ occurs is subtle – some, but not all, peer to peer IP traffic over the wireless LAN was being silently dropped. Broken things included Multicast based protocols (such as AirPrint) and also initial connection establishment (with the ARP protocol’s broadcast frames apparently being filtered as well)
This impacted some network nodes but not to all of them.
The way this first manifested was that our printer stopped working. It didn’t work wireless-to-wireless, and it didn’t work when I tried cabling it into a switch and accessing it via wireless on my laptop.
The failure to work even on a wired connection convinced me the printer was just… b0rked, and that it was time to replace it.
In a similar timeframe, the Sonos audio devices in our house also stopped working. I didn’t initially register the timing coincidence with the printer fault because, frankly, Sonos network communication sometimes get flakey ‘all by itself’…and I had shelved that issue to figure out ‘later’.
In the meantime, we actually went and bought a new printer! When I fired that up, it failed to work as well (!). Ah hah… ‘lightbulb moment’. It wasn’t the printer. It was something far more fundamental about the network. It had to be.
I started digging deeper, with that new data point (that it wasn’t actually the printer at fault). Eventually, via various network diagnostic steps, I figured out that a strange network failure mode was the real issue.
Since it seemed to be impacting broadcast and multicast frames, I started looking for, and adjusting settings related to that functionality in the UDM-Pro.
Eventually I landed on the fix:
Turn OFF the ‘AUTO OPTIMIZE NETWORK’ setting under the Settings->Site menu. This is necessary in order to unlock the ability to change the next setting
Turn OFF the checkbox called “Block LAN to WLAN Multicast and Broadcast Traffic” on the Settings->Wireless Networks-> [ Your network name here ] -> EDIT WIRELESS NETWORK page
Ii is that latter setting that is doing the damage. When on, it is literally blocking things as fundamental as ARP broadcast packets so that IP connections can’t even be established reliably between hosts on the local network. Hosts can all talk to ‘the wider Internet’ just fine, but they can’t talk to each other.
I didn’t take any manual action to break the network – all I did was leave automatic OS updates turned on and a few days ago – these mysterious faults appeared.
See the screen shots below to help you find where those settings located on the UniFi Web console interface.
As soon as I took the steps above, the UDP-Pro re-provisioned all my UniFI WiFi access points and everything started working again – printer started working perfectly, Sonos started working fine. I proved this is the issue by turning the setting back on and – bingo – instant network faults. Off again…and it all works again.
I hope this helps someone else!
I’ve reported it to Ubiquiti because I think they need to urgently make a further update to fix this pretty fundamental bug, and hopefully they’ll indeed fix it.
In the meantime, I hope this post helps someone else avoid the head scratching (and/or throwing out perfectly good printers and/or Sonos units!).
12 Dec 2021: The ‘Network’ version has been incremented to 6.5.54 as at 12 December 2021.
Some good has been done in this incremental release: It has been kindly pointed out to me that a workaround is in there now for small sites, so at least small sites (less than 10 APs) won’t suffer this issue any longer.
The following comment has been added to the release notes for 6.5.54 (from here):
Enable multicast block if Auto-optimize is enabled, and there are more than 10 APs assigned to SSID.
My site does have more than 10 APs assigned to the SSID concerned. So, for me, 6.5.54 still shows the issue (which explains why, in my testing, the fault wasn’t fixed in 6.5.54).
This is a good thing – sites with less than 10 APs will now no longer see this bug.
However: The bug still exists and this workaround is just hiding it until it leaps out and bites sites in the tail when they expand to 10 or more APs (on the same SSID). Argh.
This feature (when enabled) is breaking fundamental aspects of TCP/IP network operation on a routine basis.
There are two issues here.
The first is that it is quite possible for two hosts to be associated to two different APs while being in the same physical location. Picture a printer on a desk and a user of that printer who has walked in from the AP next door (and is still associated with the AP next door).
Or picture a location that happens to just be at the intersection of two roughly equidistant APs (which is going to happen all the time on a network with more than 10 APs)
When this happens, the outcome for users in terms of Multicast/Broadcast activity is going to become intermittent – sometimes it’ll block packets, and sometimes (if host hosts do happen to both be on the same AP) it might work…for a while. And then stop again mysteriously later.
This intermittency was evident in my initial testing (and now I appreciate why).
As people make their networks larger (and of course for anyone who already was running a large network and who has auto-updated), they will see this mysterious problem happen both without warning and without explanation.
I actually think that’s worse… because now its a fault that unexpectedly occurs when the network expands beyond a certain point. Pity the IT guy who has to figure that one out, with the sole clue being that one line in the Network 6.5.54 release notes.
The signature example of the seriousness of this problem is something completely fundamental to working TCP/IP networks:
This feature blocks ARP packets
As a result, the establishment of working unicast connections between hosts in the local network (e.g. as fundamental as connecting to a printer and using it) will not work reliably (or in many cases, will not work at all)… which is where we came in.
It also means, for instance, that if you try to ‘ping’ another host on your local IP range, that ping might work, or it might not, depending on where the other host is, across your network (or on whether it has roamed to another AP that is reachable in the same physical location).
Debugging that sort of thing could drive people a bit crazy.
Without consideration of this, the functionality of the feature in general is pretty broken.
I get the point, and in implementing this, Ubiquiti means well, but it has not been fully thought out and it is going to be the cause of nasty (and worse, subtle) network faults on a continuing basis until more effort is put in to how this feature works (and to allowing the operator to select Broadcast and MultiCast packet types to continue to forward!).
I’ve noticed that when turned on, this feature allows for the addition of hosts by MAC address that are still able to be visible network wide:
That is a tacit admission of how this feature breaks stuff.
Adding hosts to the exclude-from-blocking list by MAC address is well meaning, but network operators will be perpetually chasing their own tails as people add printers or audio devices (or replace busted ones). Maintaining a MAC address block list is just a ‘make work’ activity that no network administrator (or their users) needs. Not ever.
Ubiquiti has implemented extensive device ‘fingerprinting’ of devices over time. This meansthey can figure out what things are. If this feature is going to exist (and be silently turned on without warning!!) at all, then it has to be configurable in terms of device types and/or broadcast/multicast protocols that can be whitelisted, not hosts.
Again the issue here is that there are protocols (like ARP – argh) that you just can’t block between APs at all, without breaking fundamental aspects of how TCP/IP networks work.
This isn’t good, and until it is further improved, the underlying problem remains. The change in .54 does help a bit, for most people… but for the people it doesn’t help, it has made the real problem (that the feature itself is un-tenable as it stands) both harder to find (and hence harder to fix).