Fastest way to find a full path of a given file via Powershell? Fastest way to find a full path of a given file via Powershell? powershell powershell

Fastest way to find a full path of a given file via Powershell?


tl;dr:

This answer does not try to solve the parallel problem as asked, however:

  • A single, recursive [IO.Directory]::GetFiles() call may be fast enough, though note that if inaccessible directories are involved this is only an option in PowerShell [Core] v6.2+:
# PowerShell [Core] v6.2+[IO.Directory]::GetFiles(  $searchDir,   $searchFile,  [IO.EnumerationOptions] @{ AttributesToSkip = 'ReparsePoint'; RecurseSubdirectories = $true; IgnoreInaccessible = $true })
  • Pragmatically speaking (outside of, say, a coding exercise), calling robocopy is a perfectly legitimate approach - assuming you only need to run on Windows - which is as simple as (note that con is a dummy argument for the unused target-directory parameter):
(robocopy $searchDir con $searchFile /l /s /mt /njh /njs /ns /nc /ndl /np).Trim() -ne ''

A few points up front:

  • but calling ROBOCOPY is a bit of cheating.

    • Arguably, using .NET APIs / WinAPI calls is just as much cheating as calling an external utility such as RoboCopy (e.g. robocopy.exe /l ...). After all, calling external programs is a core mandate of any shell, including PowerShell (and neither System.Diagnostics.Process nor its PowerShell wrapper, Start-Process, are required for that).That said, while not a problem in this case, you do lose the ability to pass and receive objects when you call an external program, and in-process operations are typically faster.
  • For timing execution of commands (measuring performance), PowerShell offers a high-level wrapper around System.Diagnostics.Stopwatch: the Measure-Command cmdlet.

  • Such performance measurements fluctuate, because PowerShell, as a dynamically resolved language, employs lot of caches that incur overhead when they're first filled, and you generally won't know when that happens - see this GitHub issue for background information.

  • Additionally, a long-running command that traverses the file system is subject to interference from other processes running at the same time, and whether file-system information has already been cached from a previous run makes a big difference.

  • The following comparison uses a higher-level wrapper around Measure-Object, the Time-Command function, which makes comparing the relative runtime performance of multiple commands easy.


The key to speeding up PowerShell code is to minimize the actual PowerShell code and offload as much of the work possible to .NET method calls / (compiled) external programs.

The following contrasts the performance of:

  • Get-ChildItem (just for contrast, we know that it is too slow)

  • robocopy.exe

  • A single, recursive call to System.IO.Directory.GetFiles(), which may be fast enough for your purposes, despite being single-threaded.

    • Note: The call below uses features only available in .NET Core 2.1+ and therefore works in PowerShell [Core] v6.2+ only.The .NET Framework version of this API doesn't allow ignoring inaccessible directories (due to lack of permission), which makes the enumeration fail if such directories are encountered.
$searchDir = 'C:\'                                                                          #'# dummy comment to fix syntax highlighting$searchFile = 'hosts'# Define the commands to compare as an array of script blocks.$cmds =   {     [IO.Directory]::GetFiles(      $searchDir,       $searchFile,      [IO.EnumerationOptions] @{ AttributesToSkip = 'ReparsePoint'; RecurseSubdirectories = $true; IgnoreInaccessible = $true }    )  },  {    (Get-ChildItem -Literalpath $searchDir -File -Recurse -Filter $searchFile -ErrorAction Ignore -Force).FullName  },  {    (robocopy $searchDir con $searchFile /l /s /mt /njh /njs /ns /nc /ndl /np).Trim() -ne ''  } Write-Verbose -vb "Warming up the cache..."# Run one of the commands up front to level the playing field# with respect to cached filesystem information.$null = & $cmds[-1]# Run the commands and compare their timings.Time-Command $cmds -Count 1 -OutputToHost -vb

On my 2-core Windows 10 VM running PowerShell Core 7.1.0-preview.7 I get the following results; the numbers vary based on a lot of factors (not just the number of files), but should provide a general sense of relative performance (column Factor).

Note that since the file-system cache is deliberately warmed up beforehand, the numbers for a given machine will be too optimistic compared to a run without cached information.

As you can see, the PowerShell [Core] [System.IO.Directory]::GetFiles() call actually outperformed the multi-threaded robocopy call in this case.

VERBOSE: Warming up the cache...VERBOSE: Starting 1 run(s) of:    [IO.Directory]::GetFiles(      $searchDir,      $searchFile,      [IO.EnumerationOptions] @{ AttributesToSkip = 'ReparsePoint'; RecurseSubdirectories = $true; IgnoreInaccessible = $true }    )  ...C:\Program Files\Git\etc\hostsC:\Windows\WinSxS\amd64_microsoft-windows-w..ucture-other-minwin_31bf3856ad364e35_10.0.18362.1_none_079d0d71e24a6112\hostsC:\Windows\System32\drivers\etc\hostsC:\Users\jdoe\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu18.04onWindows_79rhkp1fndgsc\LocalState\rootfs\etc\hostsVERBOSE: Starting 1 run(s) of:    (Get-ChildItem -Literalpath $searchDir -File -Recurse -Filter $searchFile -ErrorAction Ignore -Force).FullName  ...C:\Program Files\Git\etc\hostsC:\Users\jdoe\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu18.04onWindows_79rhkp1fndgsc\LocalState\rootfs\etc\hostsC:\Windows\System32\drivers\etc\hostsC:\Windows\WinSxS\amd64_microsoft-windows-w..ucture-other-minwin_31bf3856ad364e35_10.0.18362.1_none_079d0d71e24a6112\hostsVERBOSE: Starting 1 run(s) of:    (robocopy $searchDir con $searchFile /l /s /mt /njh /njs /ns /nc /ndl /np).Trim() -ne ''  ...C:\Program Files\Git\etc\hostsC:\Windows\WinSxS\amd64_microsoft-windows-w..ucture-other-minwin_31bf3856ad364e35_10.0.18362.1_none_079d0d71e24a6112\hostsC:\Windows\System32\drivers\etc\hostsC:\Users\jdoe\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu18.04onWindows_79rhkp1fndgsc\LocalState\rootfs\etc\hostsVERBOSE: Overall time elapsed: 00:01:48.7731236Factor Secs (1-run avg.) Command------ ----------------- -------1.00   22.500            [IO.Directory]::GetFiles(…1.14   25.602            (robocopy /l $searchDir NUL $searchFile /s /mt /njh /njs /ns /nc /np).Trim() -ne ''2.69   60.623            (Get-ChildItem -Literalpath $searchDir -File -Recurse -Filter $searchFile -ErrorAction Ignore -Force).FullName


This is the final code I created. Runtime is now 2,8627695 sec.Limiting the prallelism to the number of logical cores gave a better performance than doing a Parallel.ForEach for all subdirectories.

Instead of returning only the filename, you can return the full FileInfo-Object per hit into the resulting BlockingCollection.

# powershell-sample to find all "hosts"-files on Partition "c:\"clsRemove-Variable * -ea 0[System.GC]::Collect()$ErrorActionPreference = "stop"$searchDir  = "c:\"$searchFile = "hosts"add-type -TypeDefinition @"using System;using System.IO;using System.Linq;using System.Collections.Concurrent;using System.Runtime.InteropServices;using System.Threading.Tasks;using System.Text.RegularExpressions;public class FileSearch {    public struct WIN32_FIND_DATA {        public uint dwFileAttributes;        public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;        public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;        public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;        public uint nFileSizeHigh;        public uint nFileSizeLow;        public uint dwReserved0;        public uint dwReserved1;        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]        public string cFileName;        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]        public string cAlternateFileName;    }    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]    static extern IntPtr FindFirstFile        (string lpFileName, out WIN32_FIND_DATA lpFindFileData);    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]    static extern bool FindNextFile        (IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]    static extern bool FindClose(IntPtr hFindFile);    static IntPtr INVALID_HANDLE_VALUE = new IntPtr(-1);    static BlockingCollection<string> dirList {get;set;}    static BlockingCollection<string> fileList {get;set;}    public static BlockingCollection<string> GetFiles(string searchDir, string searchFile) {        bool isPattern = false;        if (searchFile.Contains(@"?") | searchFile.Contains(@"*")) {            searchFile = @"^" + searchFile.Replace(@".",@"\.").Replace(@"*",@".*").Replace(@"?",@".") + @"$";            isPattern = true;        }        fileList = new BlockingCollection<string>();        dirList = new BlockingCollection<string>();        dirList.Add(searchDir);        int[] threads = Enumerable.Range(1,Environment.ProcessorCount).ToArray();        Parallel.ForEach(threads, (id) => {            string path;            IntPtr handle = INVALID_HANDLE_VALUE;            WIN32_FIND_DATA fileData;            if (dirList.TryTake(out path, 100)) {                do {                    path = path.EndsWith(@"\") ? path : path + @"\";                    handle = FindFirstFile(path + @"*", out fileData);                    if (handle != INVALID_HANDLE_VALUE) {                        FindNextFile(handle, out fileData);                        while (FindNextFile(handle, out fileData)) {                            if ((fileData.dwFileAttributes & 0x10) > 0) {                                string fullPath = path + fileData.cFileName;                                dirList.TryAdd(fullPath);                            } else {                                if (isPattern) {                                    if (Regex.IsMatch(fileData.cFileName, searchFile, RegexOptions.IgnoreCase)) {                                        string fullPath = path + fileData.cFileName;                                        fileList.TryAdd(fullPath);                                    }                                } else {                                    if (fileData.cFileName == searchFile) {                                        string fullPath = path + fileData.cFileName;                                        fileList.TryAdd(fullPath);                                    }                                }                            }                        }                        FindClose(handle);                    }                } while (dirList.TryTake(out path));            }        });        return fileList;    }}"@$fileList = [fileSearch]::GetFiles($searchDir, $searchFile)$fileList