If you're using VMWare Fusion 7 for Mac give up on native shared disks

Whatever the intended purpose these are not suitable to act as shared storage in a Windows Server Cluster - at least they will cause the cluster to fail validation. Also the VMWare GUI doesn't allow you to add these to linked clones so you'll spend a long time researching how to tweak the .vmx files by hand and it's just straight up not worth it.

Simulate shared disks with Microsoft iSCSI Software Target 3.3 but put it on a third host

If you are sharing a disk from a node, you can't also access it on that node through the iSCSI interface. It will work remotely fine but give a non-meaningful error message when you try to use it locally and spend a long time trying to work out what's wrong with it. This is documented here.

The iSCSI Initiator cannot connect to a Microsoft iSCSI Software Target that runs on the same computer.

I ended up placing it on my Domain Controller.

Configure iSCSI not to enforce idle connection timeout

This is enabled by default and will cause your disks to intermittently disconnect from the cluster.

Don't Add-WindowsFeature Failover-Clustering on your sysprep image

When you add this feature it creates a network adapter with the description "Microsoft Failover Cluster Virtual Adapter". This has a fixed MAC and IP address which doesn't get updated by sysprep. Consequently the cluster validation report will fail on the "Validate that IP addresses are unique and subnets configured correctly" step.

If you made this mistake already you can Remove-WindowsFeature and add it back, which will generate the addresses fresh.

Allocate more disk space than you'll ever need or use

You can't add disks later on a linked clone without hacking the .vmx manually and you probably don't want to do this. So create a dummy second disk (it's sparse so it doesn't take up any space unless you actually need to use it) with 100GB for later repurposing.

Allocate less disk space than you'll ever need or use on the iSCSI disks

This is because these can't be shrunk once you've sized them but can be easily expanded.

Additional disks on linked clones need unique IDs allocated

During the cluster validation process any SAS drives (the SCSI drives that get created by VMWare Fusion when you add additional disks) have their Disk Identifier compared. This is used to determine which disks are likely shared and should be tested.

But if you've added drives and then created linked clones then the SAS Drive Identifier doesn't get changed and this causes cluster validation to fail.

To fix this generate a new GUID:

([GUID]::NewGuid()).Guid | Clip

And set it in DiskPart for the affected disks (in your case you'll set it to your new GUID but for the demo I just set it to the same GUID as it already is):

Disable machine account password changes in Group Policy

If you snapshot one of your VMs and then restore it 30 days later, it will fail to login to the domain because the machine account password has changed. There's fiddling involved to get this fixed and IMHO the instructions are unclear and not straightforward (who wants to add and remove a machine from the domain either?)

For lab environments its easy to disable this functionality so that it doesn't happen and won't come back to bite you. I also tweak a few other little settings to make life simpler such as enabling IMCP for pings etc.

Configure internet access through the Domain Controller

It's never properly described in the CONTOSO test lab guides or anywhere else, I think because the knowledge is assumed.

Create a private network in VMWare Fusion for everything in your lab to communicate internally over (vmnet2). On your Domain Controller VM assign one network card to this, and the other network card to Bridged mode. At this point the DC internet access should work as per normal but nothing else on the LAN is going through it yet.

Now add two features to the Domain Controller: Routing and Remote Access, and DHCP Server.

Import-Module ServerManager
Add-WindowsFeature NPAS-Routing
Add-WindowsFeature DHCP

runas /user:SAFESQL\Administrator "cmd /c %windir%\system32\dhcpmgmt.msc"
netsh.exe dhcp add securitygroups

The first command is used to open up a GUI so you can Authorize the computer to act as a DHCP Server for the domain (mine shows Unauthorize because it's already set up). This option is stupidly difficult to find because it only shows to Domain Administrators and not local Administrators on the DC nor when you right-click and Run As Administrator.

The second part is further configuration that is done if you add the feature using the Server Manager GUI but that is completely missed when you do it through PowerShell, causing tonnes of errors in the event log.

Once this is done you just need to right click IPV4 and create a New Scope with these settings (be careful because it's easy to skip screens):

Range: 10.0.0.2 to 100.0.0.100
Subnet mask: 255.255.255.0
Default gateway: 10.0.0.1 (make sure you click Add)
DNS Servers IP address: 10.0.0.1 (in the bottom right, make sure you click Add, and you might need to reorganize the order of these)
WINS Servers IP address: 10.0.0.1 (make sure you click Add)
Activate

At this point everything is set up. If you change your other computers in the lab to use the vmnet2 network interface, run ipconfig /release and ipconfig /flushdns, then they should all just start working - communicating with each other in the domain but also able to resolve and communicate out to the internet (as if by magic to us non-networking folk).


Jonathan Kehayias has a series of blog posts on configuration steps for a clustered setup in VirtualBox if you need further assistance. Hopefully these two guides together will save you from making some of the same mistakes I did.