Tuesday, September 4, 2018

Create-TestFile - Fast and efficient ways to create large quantities of files and/or large-sized test files in PowerShell

Title:  Create-TestFile - Fast and efficient ways to create large quantities of files and/or large-sized test files in PowerShell

Description:  Provides a way to generate 'test' files in a fast and efficient manner.  It works for either creating large files or millions of tiny files.  The 'fill' of the files is either:

  1. 'ByZero' (default) - Fastest - Fills the file with ASCII character position 0, not numeral 0
  2. 'ByRNG' - Slowest - Uses .NET System.Random class to fill the file(s) with random bytes
  3. 'ByCryptoRNG' - Fastest Random/Middle Overall - Uses the .NET Cryptography.RandomNumberGenerator class to file the file(s) with random bytes
Using the 'Verbose' parameter emits parameters, timing, and crypto information about the file but significantly slows down the operation due to the hash calculation.

If an explicit file size is set, the file size should be a byte-aligned number. Otherwise, the dynamic buffer size will be set to 1 and the speed of operation will be crippled.  I can't think of a reason why someone might want to create 2.83MB versus 5MB but I'm sure someone has a use-case.  I'll work on this.


Back-Story:  I was in a debate with a coworker of mine who is an adamant Linux-fan and stereotypically cautious of Microsoft products.  Previously working for Microsoft, I'm understandably a big fan of their products but I also like to understand the 'why' of an argument.

We were in a discussion about file servers and he brought up that NFS was superior to SMB.  His supporting evidence was parroted by a number of links he provided where Linux advocates plainly said that NFS was better but that was it.  The references were all supposition or personal feeling with no empirical data.  So I set out to do some testing on my own (it's incomplete so a future blog post).  In order to do it properly, I needed to be able to emulate different scenarios of moving data.  Not only did I need the ability to test very large files but I also needed the ability to test millions of tiny files.  Furthermore, knowing that some OSs/FileSystems perform data deduplication at the byte and/or block level, I needed to be able to ensure that all files were unique so my speed tests weren't being tainted (much) by a higher level optimization.

After utilizing some Google-Fu, it appeared that there were a number of Windows variants to generate test files that could be purchased but none that I could find that were free or free that didn't have people complaining about the speed with which the files were being created.  So I set out to fix that problem and Create-TestFile is the result.

Examples:


## Create a test file with a specified name, size, and cryptographically random byte fill
Create-TestFile -FileName 'something.txt' -FileSize 10MB -FillMode ByCryptoRNG

## Create 100 test files with random file names, 1KB in size
0..99 | % { Create-TestFile }

## Create 1,000,000 test files with random files name, 10B in size
0..999999 | % { Create-TestFile -FileSize 10 }

## Create 10 test files, with random file names, 2GB in size
0..9 | % { Create-TestFile -FileSize 2GB }

## Create a single large file and look at verbose output which includes user-supplied parameters (if any), determined buffer size, fill type, and hash
## Caution -- using the Verbose flag dramatically increases function time because of the SHA-1 calculation
Create-TestFile -FileSize 2147483648 -Verbose


Script:


function Create-TestFile {
param (
  [Parameter(Mandatory=$false)][System.String]$FileName = "$((Get-Location).Path)\$([System.IO.Path]::GetRandomFileName())",
  [Parameter(Mandatory=$false)][uint64]$FileSize = 1KB,
  [Parameter(Mandatory = $false)] [ValidateSet('ByZero','ByRNG','ByCryptoRNG')] [System.String] $FillMode = 'ByZero'
)

  ## Determine acceptable buffer size for speed/efficiency
  if (($FileSize % (1024*1024)) -eq 0) { ## 1MB buffer  
    $buffsize = 1024*1024
  } elseif (($FileSize % (1024)) -eq 0) { ## 1KB buffer
    $buffsize = 1024
  } elseif (($FileSize % (32)) -eq 0) {  ## 32B buffer
    $buffsize = 32
  } else {                ## 1B buffer
    $buffsize = 1
  }
 
  $fs = $rng = $s1 = $RngMethod = $LoopCount = $buffer = $null
  $s1 = [datetime]::now
  try {
    $fs = New-Object System.IO.FileStream($FileName, [System.IO.FileMode]::OpenOrCreate)

    if ($FillMode -eq 'ByZero') {

      ## Fills the file with ASCII character number 0 -- FASTEST Speed
      $fs.SetLength($FileSize)

    } elseif ($FillMode -eq 'ByRNG') {
   
      ## Fills the file with random bytes generated by System.Random -- SLOWEST Speed, oddly enough
      $rng = New-Object System.Random

      $RngMethod = 'NextBytes'

    } elseif ($FillMode -eq 'ByCryptoRNG') {
   
      ## Fills the file with cryptographically random bytes generated by Cryptography.RandomNumberGenerator -- FASTEST non-zero fill
      $rng = [System.Security.Cryptography.RandomNumberGenerator]::Create()

      $RngMethod = 'GetBytes'

    }


    if ($FillMode -eq 'ByRNG' -or $FillMode -eq 'ByCryptoRNG') {
      $LoopCount = $FileSize / $buffsize
      for ($x=0; $x -lt $LoopCount; $x++) {
        $buffer = New-Object byte[] ($buffsize)
        $rng.$RngMethod($buffer)

        $fs.Write($buffer, 0, $buffer.Length)
      }
    }


  } catch {
    Write-Error $_
  } finally {
   
    ## If the FileStream is open, close it whether or not an error occurred
    if ($fs -ne $null) {
      $fs.Close()
    }

  }
 
  #Get-Item $FileName

  Write-Verbose (
    [pscustomobject][ordered]@{
      'FileName' = $PSBoundParameters['FileName']
      'FileSize' = $PSBoundParameters['FileSize']
      'FillMode' = $PSBoundParameters['FillMode']
      'RngBaseClass' = $rng
      'BufferSize' = $buffsize
      'CreateSecs' = ((([datetime]::Now) - $s1).TotalSeconds)
      'Sha1Hash' = (Get-FileHash -Algorithm SHA1 -Path $FileName).Hash
    }
  )
}