What's the downside of just reading the MFT? Why doesn't Microsoft do it in file explorer, and why wouldn't every tool use it instead of walking through the file system? Maybe there's no downside but it's such a huge speed boost that it would be weird to not use it otherwise, right?
>What's the downside of just reading the MFT? Why doesn't Microsoft do it in file explorer, and why wouldn't every tool use it instead of walking through the file system?
One disadvantage is that you can't read the MFT of network shares or device emulators presenting "virtual drive letters" to the OS.
The typical (and slower) Win32 API functions FindFirstFile()/FindNextFile() used to iterate through the files structure work at a higher level of abstraction so they work on more targets that don't have an NTFS MFT. Indeed, if you point WizTree to a SMB network share, it will be a lot slower because it can't directly read the MFT.
It's conceivable that Microsoft developers could have programmed Windows Explorer differently to have an optimized code path of reading MFT for local disks and then fall back to slower FindFirstFile()/FindNextFile() for non-MFT disks. Maybe that adds too much complexity and weird bugs. I notice that most of the 3rd-party "Win Explorer replacement" utilities also don't read MFT.
> It's conceivable that Microsoft developers could have programmed Windows Explorer differently to have an optimized code path of reading MFT for local disks and then fall back to slower FindFirstFile()/FindNextFile() for non-MFT disks
Surely this would have been worth doing, even if it meant flushing out bugs elsewhere.
Along with the reasons others have mentioned, it would also bypass any filter driver in the file system stack (Windows has the concept of a stack of filter drivers that can sit in front of the file system or hardware) and would also ignore any permissions (ACLs) on who can see those files. There’s no way they can credibly use this technique outside of say something from SysInternals: it violates the security and layering of the operating system and its APIs.
Is there a Linux equivalent for those "filters"? I'm a bit clueless about win32 and NT sadly enough...
Would that mean that there's no way to "scope" the MFTs?
Edit:
That also makes sense, since if I got it right they aren't necessarily supposed to be consumed by userspace programs?
I guess that's why those tools always ask for admin access and basically all perms to the FS.
It's a bit sad that the user gets exposed to a much slower search and FS experience even if the system underneath has the potential to be as fast as it gets. And I don't think ReFS is intended to replace NTFS (not that it's necessarily more performant anyways)
There is no equivalent on Linux. That's why linux has no online antivirus scanners (scanners that scan the file as it's opened) while this is a basic feature of every antivirus program on Windows.
Linux has device mappers (dm-crypt, dm-raid and friends). But those sit below the file system, emulating a device. Window's file system filter drivers sit above the file system, intercepting API calls to and from the file system. That's super useful if you want to check file contents on access, track where files are going, keep an audit log of who accessed a file, transparently encrypt single files instead of whole volumes, etc. But you pay the price for all that flexibility in performance.
> That's super useful if you want to check file contents on access, track where files are going, keep an audit log of who accessed a file, transparently encrypt single files instead of whole volumes
Or if you just want to generally make the filesystem so slow that everyone has to invent their own pack files just to avoid file system api calls as much as possible.
Filters are vaguely similar to things like mountpoints overlaying portions of the filesystem. E.g. in Linux you might have files in /d1/d2/{f1,f2,f3} in the root filesystem but you also have a mountpoint of a 2nd filesystem on /d1/d2 that completely changes the visibility / contents of d2. Filter drivers can do similar things (although they are not actually independent mountpoints).
You need admin permissions to read the MFT on Windows. The traditional security model of both Windows and Linux assumes that the kernel is a security barrier between system and unprivileged user, and between different unprivileged users. An admin being able to bypass security restrictions isn't traditionally seen as a problem.
Indeed, only in very recent history has the admin/root user/owner been seen as a threat to the system and the system employs defenses against them. I'm hoping that trend reverses because I really hate the direction things are going.
There are pretty good reasons to do that. We've been really lax in what is allowed to run as root/admin when in reality, those permissions should only be used when doing things like reading the MFT or snooping on all the network traffic with Wireshark. It should not be required to run as root/admin in order to install most software because installing software is a very common thing to do.
Even if you want more control over your system, I still think technically capable people would be better served by having a separate administrator account from your normal day-to-day account which you have to explicitly log into (so no UAC prompts, you need to go onto that other account and then you get the UAC prompt). Unfortunately, I think most Desktop OSes are still too unusable with this sort of workflow due to how much software insists on admin for installation.
I largely agree. I think what makes the "the user is a threat" model so difficult to me is that there is a lot of truth to it. Users often don't know enough to make good decisions.
I really like your idea of logging in separately, such that is isn't something you're going to do cavalierly. That seems like a great compromise to me! I fully agree that we way overuse admin and really don't need it for the majority of things.
> it would also bypass any filter driver in the file system stack
The main use case for filter drivers is antivirus, and that is primarily about file contents not file metadata - so if MFT access bypassed filter drivers, that might not be a major issue. I think most non-antivirus use cases are also primarily about data not metadata.
If necessary, one could even devise a design in which MFT access is combined with filter drivers - MFT scanning to find matching files, then for each matched file access its metadata via standard APIs (to ensure filter drivers are invoked) before returning to client. That would be slower than a pure MFT scan but still faster than a scan done purely with standard APIs. A registry key could turn this on/off so sites can decide for themselves where to place the performance versus security tradeoff
> and would also ignore any permissions (ACLs) on who can see those files
They could expose an API which enables MFT scanning with some degree of ACL checking added.
If you do the ACL check as late as possible in processing the query, it would give much better performance than standard APIs that evaluate ACLs on every access. For example, suppose I want to scan a volume for all files with the extension ‘*.exe’. The API would only have to do an ACL check on each matching entry, not on every entry it considers.
There also might be reasonable situations in which ACL checking could be bypassed. For example, if I am requesting a search for files of which I am the owner, just assume the owner should have the right to read the file’s metadata. Or, if I have read permission on a directory, assume I am allowed aggregate information on the count and total size of files in that directory and its recursive subdirectories. These “bypasses” could be controlled by system settings (registry entries / group policy), so customers with higher security needs could disable them at the cost of reduced performance.
Rather than putting this in the OS kernel, it could be a privileged system service which exports an API over LPC/COM/etc. Actually with that design it isn’t even necessary to wait for Microsoft to implement this, it could always be implemented as an open source project, if someone felt sufficiently motivated to do so. (Or even as a proprietary product, although I suspect that would limit its adoption, and the risk is if it takes off, Microsoft would just implement the same thing as a standard part of Windows.)
Reading the MFT directly requires Administrator permissions, and doing it correctly means reimplementing support for every nook and cranny of NTFS including things like hard links, junction/reparse/mount points, sparse files, etc.
You call that a workaround but it’s basically the best possible situation security-wise. If this didn’t work securely then it wouldn’t be possible to implement disk defragmenter or even explorer. It’s so core to Windows NT’s security model that I wouldn’t call it a workaround.
You do similar things even with more modern stacks - assign a permission to an application and grant permissions to the application to the user.
The only real concern is that Windows NT permissions are not as granular as they could be.
> Windows NT permissions are not as granular as they could be.
For objects, Windows NT permissions are ridiculously granular; e.g. GENERIC_WRITE can be mapped to a half-dozen separately settable type-specific flags, depending on the object type (file, named pipe, etc.). It’s too granular for even an administrator to make sense of, arguably, and the documentation is somewhere between bad and nonexistent. (The UI varies from decent, like the ACL editor you can access from e.g. Explorer, to “you can’t make this shit up”, like SDDL[1].)
For subjects, the situation is not good, like on every other conventional OS. You could deal with that by introducing a “user” for each app, as on Android. But I’m not aware of any attempts to do that (that would expose this mechanism in a user-visible way).
(Then there’s the UWP sandbox, which as far as I tell is build with complete disregard of the fundamental concepts above. I don’t think it’s worth taking seriously at this time.)
I have no idea if there’s a granular object permission that could give access to the MBR of a disk. I’ve thankfully never had to dig that deep into Windows internals.
I’ve had to work with SDDL before to setup granular permissions for WMI monitoring on a whole lot of computers and my god, did it make me love the Cloud and Linux. I can’t emphasize enough how unintuitive setting these permissions is creates systemic over privileging.
Been using the portable version of 1.4 for decades after first coming across it in some PC magazine or something like that many years ago. Not terribly pretty, but it does what I need and it still works.
One possible reason is that it isn't a published part of the filesystem's external interface, and the format is not guaranteed to be static between versions or even point releases (though in reality, while the behaviours may be officially undefined that are unlikely to change significantly).
Also, it requires admin elevation to access. Anything running elevated is a potential security concern as it can access much else too.
> Why doesn't Microsoft do it in file explorer
Not sure, but it could be because that would be seen as an unfair advantage so to avoid anti-trust allegations they would have to publish the format and make stability guarantees for it, so others could use it as easily/safely. That, and the reasons above & below too.
> and why wouldn't every tool use it instead of walking through the file system?
Largely because walking the filesystem works for all filesystems, local and remote, so you cover everything with one tree walk implementation. Implementing a tree-walk over the MFT data where available is extra work to implement and support for one filesystem, and not many care enough, or are not aware of the potential speed benefit at all, for it to be a huge selling point such that all toolmakers feel compelled to bother.
> One possible reason is that it isn't a published part of the filesystem's external interface, and the format is not guaranteed to be static between versions or even point releases (though in reality, while the behaviours may be officially undefined that are unlikely to change significantly).
I am not going to pull every document, but the MFT structure is documented and published. I am uncertain what you mean by "external interface".
Though all the sub-pages of that state things like “[This structure is valid only for version 3 of NTFS volumes; it may be altered in future versions.]” — while it is true that any API could see breaking changes in future, this suggests that you should expect them, so I'd not call it supported in the same sense of the main file/directory access APIs which I would not expect to see breaking changes in (additional properties & functionality yes, but not existing things changing behaviour).
A lot of people talking about the details, does not constitute official documentation though.
You can find a lot of articles talking about SQL Server's DBCC IND and DBCC PAGE, but that isn't official documentation – they are essentially internal functions and not supported and could change or go away entirely despite having been around for many versions, as they have in Azure). Similarly there articles talking about sys.dm_db_database_page_allocations which sort-of does the job of DBCC IND, but again this is not officially documented & supported.
> I am uncertain what you mean by "external interface".
I meant the published interface. Maybe "supported API" would have been a better phrase to use?
Though as pointed out below, there is at least some official documentation on the MFT structure.
It's probably also racy to access the raw MFT while there are concurrent programs creating new files (or deleting files). That complication can be avoided by using the ordinary OS directory iteration primitives.
Yep but then the tradeoff of performance gains are completely discarded. The easiest solution is to take a snapshot with VSS, which is both fast and makes a quiesced copy of $MFT. From there, one could monitor FS changes if they wanted to have live updates.
With RAM sizes now, it's curious why any OS wouldn't just cache some or all of metadata for some local volumes on a block basis rather incur the greater resource usage of transforming disk into different structures, and then caching and track individual entries.