Since the APFS release a few years ago, there was no official high-level API to check whether a duplicated file is a clone, meaning it was not using extra storage space, or is a regular copy using the physical storage twice. The new Purple Tree release now is able to deal with clones displaying them with a specific color in “Duplicates” mode, and not counting them as duplicated files.
Among other improvements, Purple Tree 3.4 is now built for both Intel and Apple Silicon architectures.
The latest 3.3.4 update fixes the issue discussed in the previous post, bypassing /System/Volumes/Data mount point.
You may have noticed that in macOS Catalina users are requested permissions for private data services. Since Purple Tree needs access to folders containing user’s data, such as Downloads or Pictures, users will be asked for permissions for each of these folders when run for the first time.
To be able to scan these folders, users should allow the access. Purple Tree doesn’t read any file contents except for the directory structure, file sizes or checksum (SHA1) computation.
Catalina introduces a separation between a write protected system volume, seen as the main ‘Macintosh HD’ volume mounted on / and a read/write data volume called ‘Macintosh HD – Data’ which is mounted on /System/Volumes/Data. Both volumes are seen by the user as a single drive with familiar folders: /Applications, /Users, /Library etc. However these folders are now firmlinks to different drive locations. Firmlinks are a new APFS feature, similar to symbolic links, creating portals or wormholes to different locations on the drive, even across different volumes and having its own consistent path.
As an example, we can see that /Users/Guest and /System/Volumes/Data/Users/Guest will point to the same folder, just like a symlink would do, but firmlinks are seen by users (and by Purple Tree app) as regular folders and therefore are traversed when scanning a system drive. As a result the tree traversal from / will traverse Users folder twice, displaying a huge false duplicate 😱. Same for /Application which is mixing system and user installed apps!
An update fixing this issue will come soon. Meanwhile, in order to prevent this in Purple Tree 3.3.3 and earlier, you should manually exclude /System/Volumes/Data in Preferences. To do so, just browse to /System/Volumes and select what appears to be ‘Macintosh HD’ then check the added path to be correct. Please note that this is necessary only if you scan the whole system drive. In case you just want to scan your home directory or a different drive, this exclusion is not required 🙂
When you use Purple Tree you’ll probably notice a difference between the free disk space displayed by the app and the free space displayed by Finder and other system tools, which would be higher if you activated the Mac storage optimization.
In this case the free space displayed by macOS can be a fake ! It includes really unoccupied space and also purgeable space. The latter IS occupied by data, but could be freed by the system (concerns iCloud based data like pictures, apps, podcasts etc). The disk space displayed by Purple Tree is the same as given by the shell command:
APFS or Apple File System was introduced in 2017 in macOS High Sierra. Some of its most important features aim to reduce the disk space used by files, such as ‘clones’. There are some considerations to keep in mind, when analyzing the disk space: First, when a file is copied on the same Volume its contents are not duplicated, though those files can be considered as duplicates by Purple Tree. Now, even if this file is modified, the unmodified parts of it are not duplicated. Therefore the file sizes are merely an indication. The sum of every file sizes can actually be higher than the Volume size itself!
Read more in this article on Apple Developer site.
Treemap visualization of hierarchical data was introduced in 1991 by Ben Schneiderman from the University of Maryland. Purple Tree uses a modified version of the Squarified layout algorithm introduced by Jarke J. van Wijk et al. This layout avoids high aspect ratio rectangles making comparison of two areas easier. The areas of two files of equal size should appear as equal for the same tree layer. However, the nested file drawing may alter the apparent area because of the margins, so thin margins lead to more visual accuracy.