There are some limitations with doing bulk inserts into SQL Server that can make it a pain. Some of the biggest ones:

  • You can't import if you have columns that change order in the source file.
  • You can't import if you have two columns with the same name in the source file.
  • If you use a non-standard delimiter like tab you may need a Schema.ini file with an entry for every file you're going to load up.

The main way around this is to do a proper bulk insert with a corresponding format file but you usually don't get provided those and they're difficult to create when they have to manage each of the above situations. Also they look extremely hard to make and understand.

But during a recent POC I had enough with a bulky import process and decided to write a simple function that would handle each of the above problems.

  1. It takes an input file name and delimiter type.
  2. It outputs an XML format file (same input file name plus an additional .fmt extension) based on the file header.
  3. All of the fields are treated as variable length text. This is fine for me because they'll be type checked and converted when I later insert them into the database table. You might also be this lucky.

You can download the full version here but it's not very long so let's quickly walk through how it works.

function Create-SqlFormatFile {
    param (
        [Parameter(Mandatory = $true)]
        $SourceFilename,
        $Delimiter = "\t"
    )

    $headerList = New-Object System.Collections.ArrayList

$headerList is going to hold an array of column names from the file. I'm using an ArrayList for this because @() arrays are immutable and every time you add something to one you waste resources.

    foreach ($base in ((Get-Content $SourceFilename | Select -First 1) -split $Delimiter)) {
        $field = $base
        $number = 0
        while ($headerList -contains $field) {
            $field = $base + ($number++)
        }
 
        [void] $headerList.Add($field)
    }

So we read the first line from the file (assuming it's a header) and start building that column list. If a column name already exists we try variations such as column0, column1, column2, etc until one of them works.

    $format = New-Object System.Text.StringBuilder
    [void] $format.AppendLine(@"
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
"@)

Now we begin preparing our format file with a standard header using a StringBuilder. Just like with the ArrayList, strings are immutable and you waste resources by doing big loops and concatenating them using +=.

    $headerList | %{
        [void] $format.AppendLine(@" 
  <FIELD ID="$_" xsi:type="CharTerm" TERMINATOR="$(if ($_ -eq $headerList[-1]) { "\r\n" } else { $Delimiter })" />
"@)
    }

The first major section of the format file is a list of columns in file order, along with their data type (variable length string in this case) and terminators. There's a little extra logic so that if we're on the last column we mark the terminator as a carriage return and line feed.

    [void] $format.AppendLine(@"
</RECORD>
<ROW>
"@)

    $headerList | %{
        [void] $format.AppendLine(@"
  <COLUMN SOURCE="$_" NAME="$_" xsi:type="SQLNVARCHAR" NULLABLE="YES" />
"@)
    }

The second major section of the format file are the columns. These refer to one of the fields above, the data type to present to SQL Server, and it's in the order you want to see in the virtual table that will be returned from your query. They don't need to include every column that was defined, but in this case we return them all.

    [void] $format.AppendLine(@"
</ROW>
</BCPFORMAT>
"@)

    Set-Content -Path "$($SourceFilename).fmt" -Value $format.ToString()
}

And before you know it the header file is finished and written to disk. Let's see it in action with a very simple tab delimited data file:

FieldA	FieldB	FieldC	FieldD	FieldA
AA1	BB1	CC1	DD1	EE1
AA2	BB2	CC2	DD2	EE2

Creating the format file.

Create-FormatFile Source.tsv
Get-Content Source.tsv.fmt

And queried in SQL Server with Openrowset. It sucks that this doesn't allow the use of a variable for the filename but such is life. At least the format file name will be predictable.

Select	*
From	Openrowset(Bulk 'C:\Temp\Source.tsv', 
		Formatfile = 'C:\Temp\Source.tsv.fmt', 
		Firstrow = 2) Import

Don't forget there are a few minor security considerations:

  • Your login needs Administer Bulk Operations permission.
  • Your AD account needs access to the file (and possibly delegation enabled for remote shares)
  • Or if using an SQL login the database engine service account needs access to the file.

Anyway now that we have the data in table format without worrying about ordering or duplicate column names, we can much more easily manipulate it and store it into the database.