Quantcast
Channel: Jose Barreto's Blog
Viewing all 160 articles
Browse latest View live

How does New-SmbShare know whether the new share should be standalone, clustered or scale-out?

$
0
0

I got a question the other day about one of the scripts I published as part of a step-by-step for Hyper-V over SMB, Here's the relevant line from that script:

New-SmbShare -Name VMS3 -Path C:\ClusterStorage\Volume1\VMS -FullAccess FST2.Test\Administrator, FST2.Test\FST2-HV1$, FST2.Test\FST2-HV2$, FST2.Test\FST2-HVC$

The question was related to how does New-SmbShare know to create the share on the cluster as a continuously available share. Nothing in the cmdlet or its parameters tells it that. So the puzzled reader was asking what did he miss. It worked, but he could not figure out how.

The answer is quite simple, although it's not obvious. This is done automatically based on where the folder (specified the -Path parameter) lives.

Here are the rules:

  • If the path is on a local, nonclustered disk, New-SmbShare creates a standalone share
  • If the path is on a classic cluster disk, New-SmbShare creates a classic cluster file share on the group that owns that disk.
  • if the path is on a clustered shared volume (CSV), it creates a scale-out file share.

There is actually a -ScopeName parameter for New-SmbShare, which can be used to specify the cluster name (either the netname for a classic cluster or the DNN for a Scale-Out cluster), but in most cases this is entirely optional.

There is also a -ContinuoulyAvailable parameter, but it automatically defaults to $true if the share is on a cluster, so it also optional (unless you want to create a non-CA share on a cluster - not a good idea anyway).

You can read more about these automatic behaviors in SMB 3.0 at http://blogs.technet.com/b/josebda/archive/2012/10/08/windows-server-2012-file-servers-and-smb-3-0-simpler-and-easier-by-design.aspx

For more details about SMB PowerShell cmdlets, check out http://blogs.technet.com/b/josebda/archive/2012/06/27/the-basics-of-smb-powershell-a-feature-of-windows-server-2012-and-smb-3-0.aspx?Redirected=true


How to use the new SMB 3.0 WMI classes in Windows Server 2012 and Windows 8 (from PowerShell)

$
0
0

If you're an IT Administrator, you're likely to use the new SMB PowerShell cmdlets to manage your SMB 3.0 file shares. You can find details about those at http://blogs.technet.com/b/josebda/archive/2012/06/27/the-basics-of-smb-powershell-a-feature-of-windows-server-2012-and-smb-3-0.aspx

However, if you're a developer, you might be interested in learning about the WMI v2 classes that are behind those PowerShell cmdlets. They are easy to use and exactly match the PowerShell functionality. In fact, you can test them via PowerShell using the Get-WMIObject cmdlet. These WMIv2 classes are available for both Windows 8 and Windows Server 2012.

What is sometimes a little harder to figure out is exactly how to find detailed information about them if you don't know exactly where to look. The key information you need is the namespace for those classes. In the case of SMB, the namespace is Root\Microsoft\Windows\SMB. Here is a sample PowerShell cmdlet to query the WMI information:

PS C:\Windows\system32> Get-WMIObject -Namespace "root\Microsoft\Windows\SMB" -List "MSFT_*"

   NameSpace: ROOT\Microsoft\Windows\SMB

Name                                Methods              Properties
----                                -------              ----------
MSFT_SmbShare                       {CreateShare, Gra... {AvailabilityType, CachingMode, ...
MSFT_SmbShareAccessControlEntry     {}                   {AccessControlType, AccessRight,...
MSFT_WmiError                       {}                   {CIMStatusCode, CIMStatusCodeDes...
MSFT_ExtendedStatus                 {}                   {CIMStatusCode, CIMStatusCodeDes...
MSFT_SmbClientNetworkInterface      {}                   {FriendlyName, InterfaceIndex, I...
MSFT_SmbServerNetworkInterface      {}                   {FriendlyName, InterfaceIndex, I...
MSFT_SmbConnection                  {}                   {ContinuouslyAvailable, Credenti...
MSFT_SmbOpenFile                    {ForceClose}         {ClientComputerName, ClientUserN...
MSFT_SmbMultichannelConnection      {Refresh}            {ClientInterfaceFriendlyName, Cl...
MSFT_SmbClientConfiguration         {GetConfiguration... {ConnectionCountPerRssNetworkInt...
MSFT_SmbShareChangeEvent            {}                   {EventType, Share}
MSFT_SmbServerConfiguration         {GetConfiguration... {AnnounceComment, AnnounceServer...
MSFT_SmbSession                     {ForceClose}         {ClientComputerName, ClientUserN...
MSFT_SmbMapping                     {Remove, Create}     {LocalPath, RemotePath, Status}
MSFT_SmbMultichannelConstraint      {CreateConstraint}   {InterfaceAlias, InterfaceGuid, ...

To test one of these via PowerShell, you can use the same Get-WMIObject cmdlet. Here's a sample (with the traditional PowerShell and the WMI equivalent):

PS C:\Windows\system32> Get-SmbShare | Select Name, Path

Name                                                        Path
----                                                        ----
ADMIN$                                                      C:\Windows
C$                                                          C:\
IPC$
Users                                                       C:\Users

 

PS C:\Windows\system32> Get-WMIObject -Namespace "root\Microsoft\Windows\SMB" MSFT_SmbShare | Select Name, Path

Name                                                        Path
----                                                        ----
ADMIN$                                                      C:\Windows
C$                                                          C:\
IPC$
Users                                                       C:\Users

Obviously the two outputs are exactly the same and the PowerShell version is much simpler. You'll only use WMI if you're really running this from an application, where using WMI classes might be simpler than invoking PowerShell.

The WMI side could get even more complex if you have to filter things or invoke a method of the WMI class (instead of simply getting properties of the returned object).

For instance, here's how you would use WMI to invoke the GetConfiguration method of the MSFT_SmbClientConfiguration class, which would be the equivalent of using the Get-SmbClientConfiguration PowerShell cmdlet:

PS C:\Windows\system32> Get-SmbClientConfiguration

ConnectionCountPerRssNetworkInterface : 4
DirectoryCacheEntriesMax              : 16
DirectoryCacheEntrySizeMax            : 65536
DirectoryCacheLifetime                : 10
EnableBandwidthThrottling             : True
EnableByteRangeLockingOnReadOnlyFiles : True
EnableLargeMtu                        : True
EnableMultiChannel                    : True
DormantFileLimit                      : 1023
EnableSecuritySignature               : True
ExtendedSessionTimeout                : 1000
FileInfoCacheEntriesMax               : 64
FileInfoCacheLifetime                 : 10
FileNotFoundCacheEntriesMax           : 128
FileNotFoundCacheLifetime             : 5
KeepConn                              : 600
MaxCmds                               : 50
MaximumConnectionCountPerServer       : 32
OplocksDisabled                       : False
RequireSecuritySignature              : False
SessionTimeout                        : 60
UseOpportunisticLocking               : True
WindowSizeThreshold                   : 8 

PS C:\Windows\system32> $cc = Invoke-WMIMethod -Namespace "root\Microsoft\Windows\SMB" -Class MSFT_SmbClientConfiguration -Name GetConfiguration
PS C:\Windows\system32> $cc

__GENUS          : 1
__CLASS          : __PARAMETERS
__SUPERCLASS     :
__DYNASTY        : __PARAMETERS
__RELPATH        : __PARAMETERS
__PROPERTY_COUNT : 2
__DERIVATION     : {}
__SERVER         : <REMOVED>
__NAMESPACE      : ROOT\Microsoft\Windows\Smb
__PATH           : \\<REMOVED>\ROOT\Microsoft\Windows\Smb:__PARAMETERS
Output           : System.Management.ManagementBaseObject
ReturnValue      : 0
PSComputerName   : <REMOVED>

PS C:\Windows\system32> $cc.Output

__GENUS                               : 2
__CLASS                               : MSFT_SmbClientConfiguration
__SUPERCLASS                          :
__DYNASTY                             : MSFT_SmbClientConfiguration
__RELPATH                             :
__PROPERTY_COUNT                      : 23
__DERIVATION                          : {}
__SERVER                              : <REMOVED>
__NAMESPACE                           : ROOT\Microsoft\Windows\SMB
__PATH                                :
ConnectionCountPerRssNetworkInterface : 4
DirectoryCacheEntriesMax              : 16
DirectoryCacheEntrySizeMax            : 65536
DirectoryCacheLifetime                : 10
DormantFileLimit                      : 1023
EnableBandwidthThrottling             : True
EnableByteRangeLockingOnReadOnlyFiles : True
EnableLargeMtu                        : True
EnableMultiChannel                    : True
EnableSecuritySignature               : True
ExtendedSessionTimeout                : 1000
FileInfoCacheEntriesMax               : 64
FileInfoCacheLifetime                 : 10
FileNotFoundCacheEntriesMax           : 128
FileNotFoundCacheLifetime             : 5
KeepConn                              : 600
MaxCmds                               : 50
MaximumConnectionCountPerServer       : 32
OplocksDisabled                       : False
RequireSecuritySignature              : False
SessionTimeout                        : 60
UseOpportunisticLocking               : True
WindowSizeThreshold                   : 8
PSComputerName                        : <REMOVED>

You can find more information about these SMB WMI classes at http://msdn.microsoft.com/en-us/library/windows/desktop/hh830479.aspx

Is accessing files via a loopback share the same as using a local path?

$
0
0

Question from a user (paraphrased): When we access a local file via loopback UNC path, is this the same as accessing via the local path? I mean, is  "C:\myfolder\a.txt" equal to "\\myserver\myshare\a.txt" or I'll be using TCP/IP in any way?

Answer from SMB developer: When accessing files over loopback, the initial connect and the metadata operations (open, query info, query directory, etc.) are sent over the loopback connection. However, once a file is open we detect it and forward reads/writes directly to the file system such that TCP/IP is not used. Thus there is some difference for metadata operations, but data operations (where the majority of the data is transferred) behave just like local access.

How much traffic needs to pass between the SMB Client and Server before Multichannel actually starts?

$
0
0

One smart MVP was doing some testing and noticed that SMB Multichannel did not trigger immediately after an SMB session was established. So, he asked: How much traffic needs to pass between the SMB Client and Server before Multichannel actually starts?

Well... SMB Multichannel works slightly different in that regard depending on whether the client is running Windows 8 or Windows Server 2012.

On Windows Server 2012, SMB Multichannel starts whenever an SMB read or SMB write is issued on the session (but not other operations). For servers, network fault tolerance is a key priority and sessions are typically long lasting, so we set up the extra channels as soon as we detect any read or write.

SMB Multichannel in Windows 8 will only engage if there are a few IOs in flight at the same time (technically, when the SMB window size get to a certain point). The default for this WindowSizeThreshold setting is 8 (meaning that there are at least 8 packets asynchronously in flight). That requires some level of activity on the SMB client, so a single small file copy won't trigger it. We wanted to avoid starting Multichannel for every connection from a client, especially if just doing a small amount of work. You can query this client setting via "Get-SmbClientConfiguration". You can set it with "Set-SmbClientConfiguration -WindowSizeThreshold n". If you set it to 1, for instance, to have a behavior similar to Windows Server 2012.

Even after SMB Multichannel kicks in, the extra connections might take a few seconds to actually get established. This is because the process involves querying the server for interface information, there's some thinking involved about which paths to use and SMB does this as a low priority activity. However, SMB traffic continues to use the initial connection and does not wait for additional connections to be established. Once the extra connections are setup, they won't be torn down even if activity level drops. If the session ends and is later restarted, though, the whole process will start again from scratch. 

You can learn more about SMB Multichannel at http://blogs.technet.com/b/josebda/archive/2012/06/28/the-basics-of-smb-multichannel-a-feature-of-windows-server-2012-and-smb-3-0.aspx

Minimum version of Mellanox firmware required for running SMB Direct in Windows Server 2012

$
0
0

There are two blog posts explaining in great detail what you need to do to use Mellanox ConnectX-2 or ConnectX-3 cards to implement RDMA networking for SMB 3.0 (using SMB Direct). You can find them at:

However, I commonly get some question where SMB cmdlets reports a Mellanox NIC as not being RDMA-capable. Over time, I learned that the most common issue around this is an outdated firmware. Windows Server 2012 now comes with an inbox driver for these Mellanox adapters, but it is possible that your firmware on the adapter itself is old. This will cause the NIC to not use RDMA.

To be clear, your Mellanox NIC must have firmware version 2.9.8350 or higher to work with SMB. The driver actually checks the firmware version on start up and logs a message if the firmware does not meet this criteria: "The firmware version that is burned on the device <device name> does not support Network Direct functionality. This may affect the File Transfer (SMB) performance. The current firmware version is <current version> while we recommend using firmware version 2.9.8350 or higher. Please burn a newer firmware and restart the Mellanox ConnectX device. For more details about firmware burning process please refer to Support information on http://mellanox.com".

However, since the NIC actually works fine without RDMA (at reduced performance and higher CPU utilization), some administrators might fail to identify this issue. If they are following the steps outlined in the links above, the verification steps will point to the fact that RDMA is actually not being used and the NIC is running only with TCP/IP.

The solution is obviously to download the firmware update tools from the Mellanox site and fix it. It will also come with the latest driver version, which is newer than the inbox driver. The direct link to that Mellanox page is http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=32&menu_section=34. You need to select the “Windows Server 2012” tab at the bottom of that page and download the "MLNX WinOF VPI for x64 platforms" package, shown in the picture below.

image

Sample PowerShell Scripts for Storage Spaces, standalone Hyper-V over SMB and SQLIO testing

$
0
0

These are some PowerShell snippets to configure a specific set of systems for Hyper-V over SMB testing.
Posting it here mainly for my own reference, but maybe someone else out there is configuring a server with 48 disks split into 6 pools of 8 disks.
These systems do not support SES (SCSI Enclosure Services) so I could not use slot numbers.
This setup includes only two computers: a file server (ES-FS2) and a Hyper-V host (ES-HV1).
Obviously a standalone setup. I'm using those mainly for some SMB Direct performance testing.

Storage Spaces - Create pools

$s = Get-StorageSubSystem -FriendlyName *Spaces*

$d = Get-PhysicalDisk | ? { ";1;2;3;4;5;6;7;8;".Contains(";"+$_.DeviceID+";") }
New-StoragePool -FriendlyName Pool1 -StorageSubSystemFriendlyName $s.FriendlyName -PhysicalDisks $d

$d = Get-PhysicalDisk | ? { ";9;10;11;12;13;14;15;16;".Contains(";"+$_.DeviceID+";") }
New-StoragePool -FriendlyName Pool2 -StorageSubSystemFriendlyName $s.FriendlyName -PhysicalDisks $d

$d = Get-PhysicalDisk | ? { ";17;18;19;20;21;22;23;24;".Contains(";"+$_.DeviceID+";") }
New-StoragePool -FriendlyName Pool3 -StorageSubSystemFriendlyName $s.FriendlyName -PhysicalDisks $d

$d = Get-PhysicalDisk | ? { ";25;26;27;28;29;30;31;32;".Contains(";"+$_.DeviceID+";") }
New-StoragePool -FriendlyName Pool4 -StorageSubSystemFriendlyName $s.FriendlyName -PhysicalDisks $d

$d = Get-PhysicalDisk | ? { ";33;34;35;36;37;38;39;40;".Contains(";"+$_.DeviceID+";") }
New-StoragePool -FriendlyName Pool5 -StorageSubSystemFriendlyName $s.FriendlyName -PhysicalDisks $d

$d = Get-PhysicalDisk | ? { ";41;42;43;44;45;46;47;48;".Contains(";"+$_.DeviceID+";") }
New-StoragePool -FriendlyName Pool6 -StorageSubSystemFriendlyName $s.FriendlyName -PhysicalDisks $d
 

Storage Spaces - Create Spaces (Virtual Disks)

1..6 | % {
Set-ResiliencySetting -Name Mirror -NumberofColumnsDefault 4 -StoragePool  ( Get-StoragePool -FriendlyName Pool$_ )
New-VirtualDisk -FriendlyName Space$_ -StoragePoolFriendlyName Pool$_ -ResiliencySettingName Mirror -UseMaximumSize
}

Initialize disks, partitions and volumes

1..6 | % {
$c = Get-VirtualDisk -FriendlyName Space$_ | Get-Disk
Set-Disk -Number $c.Number -IsReadOnly 0
Set-Disk -Number $c.Number -IsOffline 0
Initialize-Disk -Number $c.Number -PartitionStyle GPT
$L = “DEFGHI”[$_-1]
New-Partition -DiskNumber $c.Number -DriveLetter $L -UseMaximumSize
Initialize-Volume -DriveLetter $L -FileSystem NTFS -Confirm:$false
}

Confirm everything is OK

Get-StoragePool Pool* | sort FriendlyName | % { $_ ; ($_ | Get-PhysicalDisk).Count }
Get-VirtualDisk | Sort FriendlyName
Get-VirtualDisk Space* | % { $_ | FT FriendlyName, Size, OperationalStatus, HealthStatus ; $_ | Get-PhysicalDisk | FT DeviceId, Usage, BusType, Model }
Get-Disk
Get-Volume | Sort DriveLetter

Verify SMB Multichannel configuration

Get-SmbServerNetworkInterface -CimSession ES-FS2
Get-SmbClientNetworkInterface -CimSession ES-HV1 | ? LinkSpeed -gt 1
Get-SmbMultichannelConnection -CimSession ES-HV1

On the local system, create files and run SQLIO

1..6 | % {
   $d = “EFGHIJ”[$_-1]
   $f=$d+”:\testfile.dat”
   fsutil file createnew $f (256GB)
   fsutil file setvaliddata $f (256GB)
}

c:\sqlio\sqlio2.exe -s10 -T100 -t4 -o16 -b512 -BN -LS -fsequential -dEFGHIJ testfile.dat
c:\sqlio\sqlio2.exe -s10 -T100 -t16 -o16 -b8 -BN -LS -frandom -dEFGHIJ testfile.dat

Create the SMB Shares for use by ES-HV1

1..6 | % {
   $p = “EFGHIJ”[$_-1]+":\"
   $s = "Share"+$_
   New-SmbShare -Name $s -Path $p -FullAccess ES\User, ES\ES-HV1$
   (Get-SmbShare -Name $s).PresetPathAcl | Set-Acl
}

On remote system, map the drives and run SQLIO again:

1..6 | % {
   $l = “EFGHIJ”[$_-1]+":"
   $r = "\\ES-FS2\Share"+$_
   New-SmbMapping -LocalPath $l -RemotePath $r
}

c:\sqlio\sqlio2.exe -s10 -T100 -t4 -o16 -b512 -BN -LS -fsequential -dEFGHIJ testfile.dat
c:\sqlio\sqlio2.exe -s10 -T100 -t16 -o16 -b32 -BN -LS -frandom -dEFGHIJ testfile.dat

Creating the VM BASE

New-VM -Name VMBASE -VHDPath C:\VMS\BASE.VHDX -Memory 8GB
Start-VM VMBASE
Remove-VM VMBASE

Set up VMs - Option 1 - from a BASE and empty shares

1..6 | % {
   Copy C:\VMS\Base.VHDX \\ES-FS2\Share$_\VM$_.VHDX
   New-VHD -Path \\ES-FS2\Share$_\Data$_.VHDX -Fixed -Size 256GB
   New-VM -Name VM$_ -VHDPath \\ES-FS2\Share$_\VM$_.VHDX -Path \\ES-FS2\Share$_ -Memory 8GB
   Set-VM -Name VM$_ -ProcessorCount 8
   Add-VMHardDiskDrive -VMName VM$_ -Path \\ES-FS2\Share$_\Data$_.VHDX
   Add-VMNetworkAdapter -VMName VM$_ -SwitchName Internal
}

Set up VMS - Option 2 - when files are already in place

1..6 | % {
   New-VM -Name VM$_ -VHDPath \\ES-FS2\Share$_\VM$_.VHDX -Path \\ES-FS2\Share$_ -Memory 8GB
   Set-VM -Name VM$_ -ProcessorCount 8
   Add-VMHardDiskDrive -VMName VM$_ -Path \\ES-FS2\Share$_\Data$_.VHDX
   Add-VMNetworkAdapter -VMName VM$_ -SwitchName Internal

Setting up E: data disk inside each VM

Set-Disk -Number 1 -IsReadOnly 0
Set-Disk -Number 1 -IsOffline 0
Initialize-Disk -Number 1 -PartitionStyle GPT
New-Partition -DiskNumber 1 -DriveLetter E -UseMaximumSize
Initialize-Volume -DriveLetter E -FileSystem NTFS -Confirm:$false

fsutil file createnew E:\testfile.dat (250GB)
fsutil file setvaliddata E:\testfile.dat (250GB)

Script to run inside the VMs

PowerShell script sqlioloop.ps1 on a shared X: drive (mapped SMB share) run from each VM
Node identified by the last byte of its IPv4 address.
Empty file go.go works as a flag to start running the workload on several VMs at once.
Also saves SQLIO output to a text file on the shared X: drive
Using a separate batch files to run SQLIO itself so they can be easily tuned even when the script is running on all VMs

CD X:\SQLIO
$node = (Get-NetIPAddress | ? IPaddress -like 192.168.99.*).IPAddress.Split(".")[3]
while ($true)
{
  if ((dir x:\sqlio\go.go).count -gt 0)
  {
     "Starting large IO..."
     .\sqliolarge.bat >large$node.txt
     "Pausing 10 seconds..."
      start-sleep 10
     "Starting small IO..."
      .\sqliosmall.bat >small$node.txt
     "Pausing 10 seconds..."
      start-sleep 10
   }
   "node "+$node+" is waiting..."
   start-sleep 1
}

PowerShell script above uses file sqliolarge.bat

.\sqlio2.exe -s20 -T100 -t2 -o16 -b512 -BN -LS -fsequential -dE testfile.dat

Also uses sqliosmall.bat

.\sqlio2.exe -s20 -T100 -t4 -o16 -b32 -BN -LS -frandom -dE testfile.dat

 

P.S.: The results obtained from a configuration similar to this one were published at http://blogs.technet.com/b/josebda/archive/2013/02/11/demo-hyper-v-over-smb-at-high-throughput-with-smb-direct-and-smb-multichannel.aspx

Hyper-V over SMB – Sample Configurations

$
0
0

This post describes a few different Hyper-V over SMB sample configurations with increasing levels of availability. Not all configurations are recommended for production deployment, since some do not provide continuous availability. The goal of the post is to show how one can add redundancy, Storage Spaces and Failover Clustering in different ways to provide additional fault tolerance to the configuration.

 

1 – All Standalone

 

image

 

Hyper-V

  • Standalone, shares used for VHD storage

File Server

  • Standalone, Local Storage

Configuration highlights

  • Flexibility (Migration, shared storage)
  • Simplicity (File Shares, permissions)
  • Low acquisition and operations cost

Configuration lowlights

  • Storage not fault tolerant
  • File server not continuously available
  • Hyper-V VMs not highly available
  • Hardware setup and OS install by IT Pro

 

2 – All Standalone + Storage Spaces

 

image

 

Hyper-V

  • Standalone, shares used for VHD storage

File Server

  • Standalone, Storage Spaces

Configuration highlights

  • Flexibility (Migration, shared storage)
  • Simplicity (File Shares, permissions)
  • Low acquisition and operations cost
  • Storage is Fault Tolerant

Configuration lowlights

  • File server not continuously available
  • Hyper-V VMs not highly available
  • Hardware setup and OS install by IT Pro

 

3 – Standalone File Server, Clustered Hyper-V

 

image

 

Hyper-V

  • Clustered, shares used for VHD storage

File Server

  • Standalone, Storage Spaces

Configuration highlights

  • Flexibility (Migration, shared storage)
  • Simplicity (File Shares, permissions)
  • Low acquisition and operations cost
  • Storage is Fault Tolerant
  • Hyper-V VMs are highly available

Configuration lowlights

  • File server not continuously available
  • Hardware setup and OS install by IT Pro

 

4 – Clustered File Server, Standalone Hyper-V

 

image

 

Hyper-V

  • Standalone, shares used for VHD storage

File Server

  • Clustered, Storage Spaces

Configuration highlights

  • Flexibility (Migration, shared storage)
  • Simplicity (File Shares, permissions)
  • Low acquisition and operations cost
  • Storage is Fault Tolerant
  • File Server is Continuously Available

Configuration lowlights

  • Hyper-V VMs not highly available
  • Hardware setup and OS install by IT Pro

 

5 – All Clustered

 

image

 

Hyper-V

  • Clustered, shares used for VHD storage

File Server

  • Clustered, Storage Spaces

Configuration highlights

  • Flexibility (Migration, shared storage)
  • Simplicity (File Shares, permissions)
  • Low acquisition and operations cost
  • Storage is Fault Tolerant
  • Hyper-V VMs are highly available
  • File Server is Continuously Available

Configuration lowlights

  • Hardware setup and OS install by IT Pro

 

6 – Cluster-in-a-box

 

image

 

Hyper-V

  • Clustered, shares used for VHD storage

File Server

  • Cluster-in-a-box

Configuration highlights

  • Flexibility (Migration, shared storage)
  • Simplicity (File Shares, permissions)
  • Low acquisition and operations cost
  • Storage is Fault Tolerant
  • File Server is continuously Available
  • Hardware and OS pre-configured by the OEM

 

More details

 

You can find additional details on these configurations in this TechNet Radio show: http://channel9.msdn.com/Shows/TechNet+Radio/TechNet-Radio-SMB-30-Deployment-Scenarios

You can also find more information about the Hyper-V over SMB scenario in this TechEd video recording: http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/VIR306

Hyper-V over SMB – Performance considerations

$
0
0

1. Introduction

 

If you follow this blog, you probably already had a chance to review the “Hyper-V over SMB” overview talk that I delivered at TechEd 2012 and other conferences. I am now working on a new version of that talk that still covers the basics, but adds brand new segments focused on end-to-end performance and detailed sample configurations. This post looks at this new end-to-end performance portion.

 

2. Typical Hyper-V over SMB configuration

 

End-to-end performance starts by drawing an end-to-end configuration. The diagram below shows a typical Hyper-V over SMB configuration including:

  • Clients that access virtual machines
  • Nodes in a Hyper-V Cluster
  • Nodes in a File Server Cluster
  • SAS JBODs acting as shared storage for the File Server Cluster

 

image

 

The main highlights of the diagram above include the redundancy in all layers and the different types of network connecting the layers.

 

3. Performance considerations

 

With the above configuration in mind, you can then start to consider the many different options at each layer that can affect the end-to-end performance of the solution. The diagram below highlights a few of the items, in the different layers, that would have a significant impact.

 

image

 

These items include, in each layer:

  • Clients
    • Number of clients
    • Speed of the client NICs
  • Virtual Machines
    • VMs per host
    • Virtual processors and RAM per VM
  • Hyper-V Hosts
    • Number of Hyper-V hosts
    • Cores and RAM per Hyper-V host
    • NICs per Hyper-V host (connecting to clients) and the speed of those NICs
    • RDMA NICs (R-NICs) per Hyper-V host (connecting to file servers) and the speed of those NICs
  • File Servers
    • Number of File Servers (typically 2)
    • RAM per File Server, plus how much is used for CSV caching
    • Storage Spaces configuration, including number of spaces, resiliency settings and number of columns per space
    • RDMA NICs (R-NICs) per File Server (connecting to Hyper-V hosts) and the speed of those NICs
    • SAS HBAs per File Server (connecting to the JBODs) and speed of those HBAs
  • JBODs
    • SAS ports per module and the speed of those ports
    • Disks per JBOD, plus the speed of the disks and of their SAS connections

It’s also important to note that the goal is not to achieve the highest performance possible, but to find a balanced configuration that delivers the performance required by the workload at the best possible cost.

 

4. Sample configuration

 

To make things a bit more concrete, you can look at a sample VDI workload. Suppose you need to create a solution to host 500 VDI VMs. Here is an outline of the thought process you would have to go through:

  • Workload, disks, JBODs, hosts
    • Start with the stated workload requirements: 500 VDI VMs, 2GB RAM, 1 virtual processor, ~50GB per VM, ~30 IOPS per VM, ~64KB per IO
    • Next, select the type of disk to use: 900 GB HDD at 10,000 rpm, around 140 IOPS
    • Also, the type of JBOD to use: SAS JBOD with dual SAS modules, two 4-lane 6Gbps port per module, up to 60 disks per JBOD
    • Finally, this is the agreed upon spec for the Hyper-V host: 16 cores, 128GB RAM
  • Storage
    • Number of disks required based on IOPS: 30 * 500 /140 = ~107 disks
    • Number of disks required based on capacity: 50GB * 2 * 500 / 900 = ~56 disks.
    • Some additional capacity is required for snapshots and backups.
    • It seems like we need 107 disks for IOPS to fulfill both the IOPS and capacity requirements
    • We can then conclude we need 2 JBODs with 60 disks each (that would give us 120 disks, including some spares)
  • Hyper-V hosts
    • 2 GB VM / 128GB = ~ 50 VM/host – leaving some RAM for host
    • 50 VMs * 1 virtual procs / 16 cores = ~ 3:1 ratio between virtual and physical processors.
    • 500 VMs / 50 = ~ 10 hosts – We could use 11 hosts, filling all the requirements plus one as spare
  • Networking
    • 500 VMs*30 IOPS*64KB = 937 MBps required – This works well with a single 10GbE which can deliver 1100 MBps . 2 for fault tolerance.
    • Single 4-lane SAS at 6Gbps delivers 2200 MBps. 2 for fault tolerance. You could actually use 3Gbps SAS HBAs here if you wanted.
  • File Server
    • 500 * 25 IOPS = 12,500 IOPS. Single file server can deliver that without any problem. 2 for fault tolerance.
    • RAM = 64GB, good size that allows for some CSV caching (up to 20% of RAM)

Please note that this is simply as an example, since your specific workload requirements may vary. There’s no general industry agreement on exactly what a VDI workload looks like, which kind of disks should be used with it or how much RAM would work best for the Hyper-V hosts in this scenario. So, take this example with a grain of salt :-)

Obviously you could have decided to go with a different type of disk, JBOD or host. In general, higher-end equipment will handle more load, but will be more expensive. For disks, deciding factors will include price, performance, capacity and endurance. Comparing SSDs and HDDs, for instance, is an interesting exercise and that equation changes constantly as new models become available and prices fluctuate. You might need to repeat the above exercise a few times with different options to find the ideal solution for your specific workload. You might want to calculate your cost per VM for each specific iteration.

Assuming you did all that and liked the results, let’s draw it out:

 

image

 

Now it’s up to you to work out the specific details of your own workload and hardware options.

 

5. Configuration Variations

 

It’s also important to notice that there are several potential configuration variations for the Hyper-V over SMB scenario, including:

  • Using a regular Ethernet NICs instead of RDMA NICs between the Hyper-V hosts and the File Servers
  • Using a third-party SMB 3.0 NAS instead of a Windows File Server
  • Using Fibre Channel or iSCSI instead of SAS, along with a traditional SAN instead of JBODs and Storage Spaces

 

6. Speeds and feeds

 

In order to make some of the calculations, you might need to understand the maximum theoretical throughput of the interfaces involved. For instance, it helps to know that a 10GbE NIC cannot deliver up more than 1.1 GBytes per second or that a single SAS HBA sitting on an 8-lane PCIe Gen2 slot cannot deliver more than 3.4 GBytes per second. Here are some tables to help out with that portion:

 

NIC Throughput
1Gb Ethernet~0.1 GB/sec
10Gb Ethernet~1.1 GB/sec
40Gb Ethernet~4.5 GB/sec
32Gb InfiniBand (QDR)~3.8 GB/sec
56Gb InfiniBand (FDR)~6.5 GB/sec

 

HBA Throughput
3Gb SAS x4~1.1 GB/sec
6Gb SAS x4~2.2 GB/sec
4Gb FC~0.4 GB/sec
8Gb FC~0.8 GB/sec
16Gb FC~1.5 GB/sec

 

Bus Slot Throughput
PCIe Gen2 x4~1.7 GB/sec
PCIe Gen2 x8~3.4 GB/sec
PCIe Gen2 x16~6.8 GB/sec
PCIe Gen3 x4~3.3 GB/sec
PCIe Gen3 x8~6.7 GB/sec
PCIe Gen3 x16~13.5 GB/sec

 

Intel QPI Throughput
4.8 GT/s~9.8 GB/sec
5.86 GT/s~12.0 GB/sec
6.4 GT/s~13.0 GB/sec
7.2 GT/s~14.7 GB/sec
8.0 GT/s~16.4 GB/sec

 

Memory Throughput
DDR2-400 (PC2-3200)~3.4 GB/sec
DDR2-667 (PC2-5300)~5.7 GB/sec
DDR2-1066 (PC2-8500)~9.1 GB/sec
DDR3-800 (PC3-6400)~6.8 GB/sec
DDR3-1333 (PC3-10600)~11.4 GB/sec
DDR3-1600 (PC3-12800)~13.7 GB/sec
DDR3-2133 (PC3-17000)~18.3 GB/sec

 

Also, here is some fine print on those tables:

  • Only a few common configurations listed.
  • All numbers are rough approximations.
  • Actual throughput in real life will be lower than these theoretical maximums.
  • Numbers provided are for one way traffic only (you should double for full duplex).
  • Numbers are for one interface and one port only.
  • Numbers use base 10 (1 GB/sec = 1,000,000,000 bytes per second)

 

7. Conclusion

 

I’m still working out the details of this new Hyper-V over SMB presentation, but this posts summarizes the portion related to end-to-end performance.

I plan to deliver this talk to an internal Microsoft audience this week and also during the MVP Summit later this month. I am also considering submissions for MMS 2013 and TechEd 2013.

You can get a preview of this portion of the talk by watching this recent TechNet Radio show I recorded with Bob Hunt: Hyper-V over SMB 3.0 Performance Considerations.


Hardware options for highly available Windows Server 2012 systems using shared, directly-attached storage

$
0
0

Highly available Windows Server 2012 systems using shared, directly-attached storage can be built using either Storage Spaces or a validated clustered RAID controller.

 

Option 1 – Storage Spaces

You can build a highly available shared SAS system today using Storage Spaces.

Storage Spaces works well in a standalone PC, but it is also capable of working in a Windows Server Failover Clustering environment. 

For implementing Clustered Storage Spaces, you will need the following Windows Server 2012 certified hardware:

  • Any SAS Host Bus Adapter or HBA (as long as it’s SAS and not a RAID controller, you should be fine)
  • SAS JBODs or disk enclosures (listed under the “Storage Spaces” category on the Server catalog)
  • SAS disks (there’s a wide variety of those, including capacity HDDs, performance HDDs and SSDs)

You can find instructions on how to configure a Clustered Storage Space in Windows Server 2012 at http://blogs.msdn.com/b/clustering/archive/2012/06/02/10314262.aspx.

A good overview of Storage Spaces and its capabilities can be found at http://social.technet.microsoft.com/wiki/contents/articles/15198.storage-spaces-overview.aspx

There's also an excellent presentation from TechEd that covers Storage Spaces at http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/WSV315

 

Option 2 – Clustered RAID Controllers

The second option is to build a highly available shared storage system using RAID Controllers that are designed to work in a Windows Server Failover Cluster configuration.

The main distinction between these RAID controllers and the ones we used before is that they work in sets (typically a pair) and coordinate their actions against the shared disks.

Here are some examples:

  • The HP StoreEasy 5000 cluster-in-a-box uses Clustered RAID controllers that HP sources and certifies. You can find details at the HP StoreEasy product page.
  • LSI is working on a Clustered RAID controller with Windows Server 2012 support. This new line of SAS RAID Controllers is scheduled for later this year. You can get details on availability dates from LSI.

 

Both options work great for all kinds of Windows Server 2012 Clusters, including Hyper-V Clusters, SQL Server Clusters, Classic File Server Clusters and Scale-Out File Servers.

You can learn more about these solutions in this TechEd presentation: http://channel9.msdn.com/Events/TechEd/Europe/2012/WSV310

Increasing Availability – The REAP Principles (Redundancy, Entanglement, Awareness and Persistence)

$
0
0

Introduction

 

Increasing availability is a key concern with computer systems. With all the consolidation and virtualization efforts under way, you need to make sure your services are always up and running, even when some components fail. However, it’s usually hard to understand the details of what it takes to make systems highly available (or continuously available). And there are so many options…

In this blog post, I will describe four principles that cover the different requirements for Availability: Redundancy, Entanglement, Awareness and Persistence. They apply to different types of services and I’ll provide some examples related to the most common server roles, including DHCP, DNS, Active Directory, Hyper-V, IIS, Remote Desktop Services, SQL Server, Exchange Server, and obviously File Services (I am in the “File Server and Clustering” team, after all). Every service employs different strategies to implement these “REAP Principles” but they all must implement them in some fashion to increase availability.

Note: A certain familiarity with common Windows Server roles and services is assumed here. If you are not familiar with the meaning of DHCP, DNS or Active Directory, this post is not intended for you. If that’s the case, you might want to do some reading on those topics before moving forward here.

 

Redundancy – There is more than one of everythingimage

 

Availability starts with redundancy. In order to provide the ability to survive failures, you must have multiple instance of everything that can possibly fail in that system. That means multiple servers, multiple networks, multiple power supplies, multiple storage devices. You should be seeing everything (at least) doubled in your configuration. Whatever is not redundant is commonly labeled a “Single Point of Failure”.

Redundancy is not cheap, though. By definition, it will increase the cost of your infrastructure. So it’s an investment that can only be justified when there is understanding of the risks and needs associated with service disruption, which should be balanced with the cost of higher availability. Sadly, that understanding sometimes only comes after a catastrophic event (such as data loss or an extended outage).

Ideally, you would have a redundant instance that is as capable as your primary one. That would make your system work as well after the failure as it did before. It might be acceptable, though, to have a redundant component that is less capable. In that case, you’ll be in a degraded (although functional) state after a failure, while the original part is being replaced. Also keep in mind that, these days, redundancy in the cloud might be a viable option.

For this principle, there’s really not much variance per type of Windows Server role. You basically need to make sure that you have multiple servers providing the service, and make sure the other principles are applied.

 

Entanglement – Achieving shared state via spooky action at a distance

 image

Having redundant equipment is required but certainly not sufficient to provide increased availability. Once any meaningful computer system is up and running, it is constantly gathering information and keeping track of it. If you have multiple instances running, they must be “entangled” somehow. That means that the current state of the system should be shared across the multiple instances so it can survive the loss of any individual component without losing that state. It will typically include some complex “spooky action at a distance”, as Einstein famously said of Quantum Mechanics.

A common way to do it is using a database (like SQL Server) to store your state. Every transaction performed by a set of web servers, for instance, could be stored in a common database and any web server can be quickly reprovisioned and connected to the database again. In a similar fashion, you can use Active Directory as a data store, as it’s done by services like DFS Namespaces and Exchange Server (for user mailbox information). Even a file server could serve a similar purpose, providing a location to store files that can be changed at any time and accessed by a set of web servers. If you lose a web server, you can quickly reprovision it and point it to the shared file server.

If using SQL Server to store the shared state, you must also abide by the Redundancy principle by using multiple SQL Servers, which must be entangled as well. One common way to do it is using shared storage. You can wire these servers to a Fibre Channel SAN or an iSCSI SAN or even a file server to store the data. Failover clustering in Windows Server (used by certain deployments of Hyper-V, File Servers and SQL Server, just to name a few) levarages shared storage as a common mechanism for entanglement.

Peeling the onion further, you will need multiple heads of those storage systems and they must also be entangled. Redundancy at the storage layer is commonly achieved by sharing physical disks and writing the data to multiple places. Most SANs have the option of using dual controllers that are connected to a shared set of disks. Every piece of data is stored synchronously to at least two disks (sometimes more). These SANs can tolerate the failure of individual controllers or disks, preserving their shared state without any disruption. In Windows Server 2012, Clustered Storage Spaces provides a simple solution for shared storage for a set of Windows Servers using only Shared SAS disks, without the need for a SAN.

There are other strategies for Entanglement that do not require shared storage, depending on how much and how frequently the state changes. If you have a web site with only static files, you could maintain shared state by simply provisioning multiple IIS servers with the exact same files. Whenever you lose one, simply replace it. For instance, Windows Azure and Virtual Machine Manager provide mechanisms to quickly add/remove instances of web servers in this fashion through the use of a service template.

If the shared state changes, which is often the case for most web sites, you could go up a notch by regularly copying updated files to the servers. You could have a central location with the current version of the shared state (a remote file server, for instance) plus a process to regularly send full updates to any of the nodes every day (either pushed from the central store or pulled by the servers). This is not very efficient for large amounts of data updated frequently, but could be enough if the total amount of data is small or it changes very infrequently. Examples of this strategy include SQL Server Snapshot Replication, DNS full zone transfers or a simple script using ROBOCOPY to copy files on a daily schedule.

In most cases, however, it’s best to employ a mechanism that can cope with more frequently changing state. Going up the scale you could have a system that sends data to its peers every hour or every few minutes, being careful to send only the data that has changed instead of the full set. That is the case for DNS incremental zone transfers, Active Directory Replication, many types of SQL Server Replication, SQL Server Log Shipping, Asynchronous SQL Server Mirroring (High-Performance Mode), SQL Server AlwaysOn Availability Groups (asynchronous-commit mode), DFS Replication and Hyper-V Replica. These models provide systems that are loosely converging, but do not achieve up-to-the-second coherent shared state. However, that is good enough for some scenarios.

At the high end of replication and right before actual shared storage, you have synchronous replication. This provides the ability to update the information on every entangled system before considering the shared state actually changed. This might slow down the overall performance of the system, especially when the connectivity between the peers suffers from latency. However, there’s something to be said of just having a set of nodes with local storage that achieve a coherent shared state using only software. Common examples here include a few types of SAN replication, Exchange Server (Database Availability Groups), Synchronous SQL Mirroring (High Safety Mode) and SQL Server AlwaysOn Availability Groups (synchronous-commit mode).

As you can see, the Entanglement principle can be addressed in a number of different ways depending on the service. Many services, like File Server and SQL Server, provide multiple mechanisms to deal with it, with varying degrees of cost, complexity, performance and coherence.

 

Awareness – Telling if Schrödinger's servers are alive or not

 image

Your work is not done after you have a redundant entangled system. In order to provide clients with seamless access to your service, you must implement some method to find one of the many sources for the service. The awareness principle refers to how your clients will discover the location of the access points for your service, ideally with a mechanism to do it quickly while avoiding any failed instances. There a few different ways to achieve it, including manual configuration, broadcast, DNS, load balancers, or a service-specific method.

One simple method is to statically configure each client with the name or IP Address of two or more instances of the service. This method is effective if the configuration of the service is not expected to change. If it ever does change, you would need to reconfigure each client. A common example here is how static DNS is configured: you simply specify the IP address of your preferred DNS server and also the IP address if an alternate DNS server in case the preferred one fails.

Another common mechanism is to broadcast a request for the service and wait for a response. This mechanism works only if there’s someone in your local network capable of providing an answer. There’s also a concern about the legitimacy of the response, since a rogue system on the network might be used to provide a malicious version of the service. Common examples here include DHCP service requests and Wireless Access Point discovery. It is fairly common to use one service to provide awareness for others. For instance, once you access your Wireless Access Point, you get DHCP service. Once you get DHCP service, you get your DNS configuration from it.

As you know, the most common use for a DNS server is to map a network name to an IP address (using an A, AAAA or CNAME DNS record). That in itself implements a certain level of this awareness principle. DNS can also associate multiple IP addresses with a single name, effectively providing a mechanism to give you a list of servers that provide a specific service. That list is provided by the DNS server in a round robin fashion, so it even includes a certain level of load balancing as part of it. Clients looking for Web Servers and File Servers commonly use this mechanism alone for finding the many devices providing a service.

DNS also provides a different type of record specifically designed for providing service awareness. This is implemented as SRV (Service) records, which not only offer the name and IP address of a host providing a service, but can decorate it with information about priority, weight and port number where the service is provided. This is a simple but remarkably effective way to provide service awareness through DNS, which is effectively a mandatory infrastructure service these days. Active Directory is the best example of using SRV records, using DNS to allow clients to learn information about the location of Domain Controllers and services provided by them, including details about Active Directory site topology.

Windows Server failover clustering includes the ability to perform dynamic DNS registrations when creating clustered services. Each cluster role (formerly known as a cluster group) can include a Network Name resource which is registered with DNS when the service is started. Multiple IP addresses can be registered for a given cluster Network  Name if the server has multiple interfaces. In Windows Server 2012, a single cluster role can be active on multiple nodes (that’s the case of a Scale-Out File Server) and the new Distributed Network Name implements this as a DNS name with multiple IP addresses (at least one from each node).

DNS does have a few limitations. The main one is the fact that the clients will cache the name/IP information for some time, as specified in the TTL (time to live) for the record. If the service is reconfigure and new addresses or service records are published, DNS clients might take some time to become aware of the change. You can reduce the TTL, but that has a performance impact, causing DNS clients to query the server more frequently. There is no mechanism in DNS to have a server proactively tell a client that a published record has changed. Another issue with DNS is that it provides no method to tell if the service is actually being provided at the moment or even if the server ever functioned properly. It is up to the client to attempt communication and handle failures. Last but not least, DNS cannot help with intelligently balancing clients based on the current load of a server.

Load balancers are the next step in providing awareness. These are network devices that function as an intelligent router of traffic based on a set of rules. If you point your clients to the IP address of the load balancer, that device can intelligently forward the requests to a set for servers. As the name implies, load balancers typically distribute the clients across the servers and can even detect if a certain server is unresponsive, dynamically taking it out of the list. Another concern here is affinity, which is an optimization that consistently forwards a given client to the same server. Since these devices can become a single point of failure, the redundancy principle must be applied here. The most common solution is to have two load balancers in combination with two records in DNS.

SQL Server again uses multiple mechanisms for implementing this principle. DNS name resolution is common, both statically or dynamically using failover clustering Network Name resources. That name is then used as part of the client configuration known as a “Connection String”. Typically, this string will provide the name of a single server providing the SQL Service, along with the database name and authentication details. For instance, a typical connection string would be: "Server=SQLSERV1A; Database=DB301; Integrated Security=True;". For SQL Mirroring, there is a mechanism to provide a second server name in the connection string itself. Here’s an example: "Server=SQLSERV1A; Failover_Partner=SQLSRV1B; Database=DB301; Integrated Security=True;".

Other services provide a specific layers of Awareness, implementing a broker service or client access layer. This is the case of DFS (Distributed File System), which simplifies access to multiple file servers using a unified namespace mechanism. In a similar way, SharePoint web front end servers will abstract the fact that multiple content databases live behind a specific SharePoint farm or site collection. SharePoint Server 2013 goes one step further by implementing a Request Manager service that can even be configure as a Web Server farm placed in front of the main SharePoint web front end farm, with the purpose of routing and throttling incoming requests to improve both performance and availability.

Exchange Server Client Access Servers will query Active Directory to find which Mailbox Server or Database Access Group contains the mailbox for an incoming client. Remote Desktop Connection Broker (formerly known as Terminal Services Session Broker), is used to provide users with access to Remote Desktop services across a set of servers. All these brokers services can typically handle a fair amount of load balancing and be aware of the state of the services behind it. Since these can become single point of failures, they are typically placed behind DNS round robin and/or load balancers.

 

Persistence – The one that is the most adaptable to change will survive

 image

Now that you have redundant entangled services and clients are aware of them, here comes the greatest challenge in availability. Persisting the service in the event of a failure. There are three basic steps to make it happen: server failure detection, failing over to a surviving server (if required) and client reconnection (if required).

Detecting the failure is the first step. It requires a mechanism for aliveness checks, which can be performed by the servers themselves, by a witness service, by the clients accessing the services or a combination of these. For instance, Windows Server failover clustering makes cluster nodes check each other (through network checks), in an effort to determine when a node becomes unresponsive.

Once a failure is detected, for services that work in an active/passive fashion (only one server provides the service and the other remains on standby), a failover is required. This can only be safely achieved automatically if the entanglement is done via Shared Storage or Synchronous Replication, which means that the data from the server that is lost is properly persisted. If using other entanglement methods (like backups or asynchronous replication), an IT Administrator typically has to manually intervene to make sure the proper state is restored before failing over the service. For all active/active solutions, with multiple servers providing the same service all the time, a failover is not required.

Finally, the client might need to reconnect to the service. If the server being used by the client has failed, many services will lose their connections and require intervention. In an ideal scenario, the client will automatically detect (or be notified of) the server failure. Then, because it is aware of other instances of the service, it will automatically connect to a surviving instance, restoring the exact same client state before the failure. This is how Windows Server 2012 implements failover of file servers though a process called SMB 3.0 Continuous Availability, available for both Classic and Scale-Out file server clusters. The file server cluster goes one step further, providing a Witness Service that will proactively notify SMB 3.0 clients of a server failure and point them to an alternate server, even before current pending requests to the failed server time out.

File servers might also leverage a combination of DFS Namespaces and DFS Replication that will automatically recover from a failed server situation, with some potential side effects. While the file client will find an alternative file server via DFS Namespaces, the connection state will be lost and need to be reestablished. Another persistence mechanism in the file server is the Offline Files option (also known as Client Side Caching) commonly used with the Folder Redirection feature. This allows you to keep working on local storage while your file server is unavailable, synchronizing again when the server comes back.

For other services, like SQL Server, the client will surface an error to the application indicating that a failover has occurred and the connection has been lost. If the application is properly coded to handle that situation, the end user will be shielded from error message because the application will simply reconnect to the SQL Server using either the same name (in the case of another server taking over that name) or a Failover Partner name (in case of SQL Server Mirroring) or another instance of SQL Server (in case of more complex log shipping or replication scenarios).

Clients of Web Servers and other load balanced workloads without any persistent state might be able to simply retry an operation in case of a failure. This might happen automatically or require the end-user to retry the operation manually. This might also be the case of a web front end layer that communicates with a web services layer. Again a savvy programmer could code that front end server to automatically retry web services requests, if they are idempotent.

Another interesting example of client persistence is provided by an Outlook client connecting to an Exchange Server. As we mentioned, Exchange Servers implement a sophisticated method of synchronous replication of mailbox databases between servers, plus a Client Access layer that brokers connections to the right set of mailbox servers. On top of that, the Outlook client will simply continue to work from its cache (using only local storage) if for any reason the server becomes unavailable. Whenever the server comes back online, the client will transparent reconnect and synchronize. The entire process is automated, without any action required during or after the failure from either end users and IT Administrators.

 

Samples of how services implement the REAP principles

 

Now that you have the principles down, let’s look at how the main services we mentioned implement them.

ServiceRedundancyEntanglementAwarenessPersistence
DHCP, using split scopesMultiple standalone DHCP ServersEach server uses its own set of scopes, no replicationActive/Active, Clients find DHCP servers via broadcast (whichever responds first)DHCP responses are cached. Upon failure, only surviving servers will respond to the broadcast
DHCP, using failover clusterMultiple DHCP Servers in a failover cluster Shared block storage (FC, iSCSI, SAS)Active/Passive, Clients find DHCP servers via broadcastDHCP responses are cached. Upon failure, failover occurs and a new server responds to broadcasts
DNS, using zone transfersMultiple standalone DNS ServersZone Transfers between DNS Servers at regular intervalsActive/Active, Clients configured with IP addresses of Primary and Alternate servers (static or via DHCP)DNS responses are cached. If query to primary DNS server fails, alternate DNS server is used
DNS, using Active Directory integrationMultiple DNS Servers in a DomainActive Directory ReplicationActive/Active, Clients configured with IP addresses of Primary and Alternate servers (static or via DHCP)DNS responses are cached. If query to primary DNS server fails, alternate DNS server is used
Active DirectoryMultiple Domain Controllers in a DomainActive Directory ReplicationActive/Active, DC Locator service finds closest Domain Controller using DNS service recordsUpon failure, DC Locator service finds a new Domain Controller
File Server, using DFS (Distributed File System)Multiple file servers, linked through DFS. Multiple DFS servers.DFS Replication maintains file server data consistency. DFS Namespace links stored in Active Directory.Active/Active, DFS Namespace used to translate namespaces targets into closest file server.Upon failure of file server, client uses alternate file server target. Upon DFS Namespace failure, alternate is used.
File Server for general use, using failover clusterMultiple File Servers in a failover clusterShared Storage (FC, iSCSI, SAS)Active/Passive, Name and IP address resources, published to DNSFailover, SMB Continuous Availability, Witness Service
File Server, using Scale-Out ClusterMultiple File Servers in a failover clusterShared Storage, Cluster Shared Volume (FC, iSCSI, SAS)Active/Active, Name resource published to DNS (Distributed Network Name)SMB Continuous Availability, Witness Service
Web Server, static contentMultiple Web ServersInitial copy onlyActive/Active. DNS round robin, load balancer or combinationClient retry
Web Server, file server back-endMultiple Web ServersShared File Server Back EndActive/Active. DNS round robin, load balancer or combinationClient retry
Web Server, SQL Server back-endMultiple Web ServersSQL Server databaseActive/Active. DNS round robin, load balancer or combinationClient retry
Hyper-V, failover clusterMultiple servers in a clusterShared Storage (FC, iSCSI, SAS, SMB File Share)Active/Passive. Clients connect to IP exposed by the VMVM restarted upon failure
Hyper-V, ReplicaMultiple serversReplication, per VMActive/Passive. Clients connect to IP exposed by the VMManual failover (test option available)
SQL Server, ReplicationMultiple serversReplication, per database (several methods)Active/Active. Clients connect by server nameApplication may detect failures and switch servers
SQL Server, Log ShippingMultiple serversLog shipping, per databaseActive/Passive. Clients connect by server nameManual failover
SQL Server, MirroringMultiple servers, optional witnessMirroring, per databaseActive/Passive, Failover Partner specified in connection stringAutomatic failover if synchronous, with witness. Application needs to reconnect
SQL Server, AlwaysOn Failover Cluster InstancesMultiple servers in a clusterShared Storage (FC, iSCSI, SAS, SMB File Share)Active/Passive, Name and IP address resources, published to DNSAutomatic Failover, Application needs to reconnect
SQL Server, AlwaysOn Availability GroupsMultiple servers in a clusterMirroring, per availability groupActive/Passive, Availability Group listener with a Name and IP address, published to DNSAutomatic Failover if using synchronous-commit mode. Application needs to reconnect
SharePoint Server (web front end)Multiple ServersSQL Server StorageActive/Active. DNS round robin, load balancer or combination.Client retry
SharePoint Server (request manager)Multiple ServersSQL Server StorageActive/Active. Request manager combined with a load balancer.Client retry
Exchange Server (DAG) with OutlookMultiple Servers in a ClusterDatabase Access Groups (Synchronous Replication)Active/Active. Client Access Point (uses AD for Mailbox/DAG information). Names published to DNS.Outlook client goes in cached mode, reconnects

 

Conclusion

 

I hope this post helped you understand the principles behind increasing server availability.

As a final note, please take into consideration that not all services require the highest possible level of availability. This might be an easier decision for certain services like DHCP, DNS and Active Directory, where the additional cost is relatively small and the benefits are sizable. You might want to think twice when increasing the availability of a large backup server, where some hours of down time might be acceptable and the cost of duplicating the infrastructure is significantly higher.

Depending on how much availability you service level agreement states, you might need different types of solutions. We generally measure availability in “nines”, as described in the table below:

Nines%AvailabilityDowntime per yearDowntime per week
190%~ 36 days~ 16 hours
299%~ 3.6 days~ 90 minutes
399.9%~ 8 hours~ 10 minutes
499.99%~ 52 minutes~ 1 minute
599.999%~ 5 minutes~ 6 seconds

You should consider your overall requirements and the related infrastructure investments that would give you the most “nines” per dollar.

Updated Links on Windows Server 2012 R2 File Server and SMB 3.0

$
0
0

In this post, I'm providing a reference to the most relevant content related to Windows Server 2012 R2 that is related to the File Server, the SMB 3.0 features and its associated scenarios like Hyper-V over SMB and SQL Server over SMB. It's obviously not a complete reference (there are always new blogs and articles being posted), but hopefully this is a useful collection of links for Windows Server 2012 R2 users.

This post covers only articles that are specific to Windows Server 2012 R2. However, note that there’s also a Windows Server 2012 version of this post. Most concepts, step-by-steps and tools listed there also apply to Windows Server 2012 R2.

 

Overview articles on Windows Server 2012 R2 File Server and SMB 3.0 (and related topics)

 

Step-by-step instructions for Windows Server 2012 R2 File Server and SMB 3.0 (and related topics)

 

TechEd 2013 presentations (with video recording) on Windows Server 2012 R2 File Server and SMB 3.0 (and related topics)

 

Demos, Interviews and other video recordings

 

Windows Server 2012 R2 download links

 

Blog posts by Microsoft MVPs on Windows Server 2012 R2 File Server and SMB 3.0 (and related topics)

 

Windows Server 2012 R2 File Server Tips and Q&A

 

Protocol Documentation Preview

 

Other relevant links related to Windows Server 2012 R2 SMB features

 

-------

 

Change tracking:

Windows Server 2012 R2 Storage: Step-by-step with Storage Spaces, SMB Scale-Out and Shared VHDX (Physical)

$
0
0

This post is a part of the nine-part “What’s New in Windows Server & System Center 2012 R2” series that is featured on Brad Anderson’s In the Cloud blog.  Today’s blog post covers Windows Server 2012 R2 Storage and how it applies to the larger topic of “Transform the Datacenter.”  To read that post and see the other technologies discussed, read today’s post: “What’s New in 2012 R2: IaaS Innovations.”

 

1) Overview

 

In this document, I am sharing all the steps I used to create a Windows Server 2012 R2 File Server environment, so you can try the new features like SMB Scale-Out Rebalancing and Shared VHDX for Guest Clustering. This configuration uses seven physical computers and a JBOD for shared storage. If you're not familiar with these technologies, I would strong encourage you to review the TechEd 2013 talks on Hyper-V over SMB (Understanding the Hyper-V over SMB Scenario, Configurations, and End-to-End Performance) and Shared VHDX (second half of Application Availability Strategies for the Private Cloud).

 

The demo setup includes the following:

  • 1 DNS / Domain Controller
  • 1 JBOD with SAS disks (suggested at least 4 HDDs with 250GB each)
  • 3 Scale-Out File Server Cluster nodes
  • 2 Hyper-V cluster nodes
  • 1 VMM Server

 

Here’s a diagram of the setup:

clip_image002

 

Here are the details about the names, roles and IP addresses for each of the computers involved:

 

ComputerRoleNet1 (Corp)Net2 (DC/DNS)NetR1 (RDMA)
JOSE-DDNS, DCDHCP192.168.100.1/24192.168.101.1/24
JOSE-A1Cluster A Node 1DHCP192.168.100.11/24192.168.101.11/24
JOSE-A2Cluster A Node 2DHCP192.168.100.12/24192.168.101.12/24
JOSE-A3Cluster A Node 3DHCP192.168.100.13/24192.168.101.13/24
JOSE-ACluster A CNON/A192.168.100.19/24192.168.101.19/24
JOSE-FScale-Out​ File Server DNNN/AN/AN/A
JOSE-B1Cluster B Node 1DHCP192.168.100.21/24192.168.101.21/24
JOSE-B2Cluster B Node 2DHCP192.168.100.22/24192.168.101.22/24
JOSE-BCluster B CNON/A192.168.100.29/24192.168.101.29/24
JOSE-VVMM ServerDHCP192.168.100.31/24192.168.101.31/24
JOSE-X1Cluster X, VM Node 1DHCP192.168.100.41/24 
JOSE-X2Cluster X, VM Node 2DHCP192.168.100.42/24 
JOSE-XGuest Cluster X CNODHCP192.168.100.49/24 
JOSE-XSSQL Server, Cluster XDHCP192.168.100.48/24 

 

Following these steps will probably require a few hours of work end-to-end, but it is a great way to experiment with a large set of Microsoft technologies in or related to Windows Server 2012 R2, including:

  • Hyper-V
  • Networking
  • Domain Name Services (DNS)
  • Active Directory Domain Services (AD-DS)
  • Shared VHDX
  • Storage Spaces
  • Failover Clustering
  • File Servers
  • PowerShell
  • SQL Server
  • Virtual Machine Manager

 

2) Required Hardware and Software, plus Disclaimers

 

You will need the following hardware to perform the steps described here:

 

You will need the following software to perform the steps described here:

 

Notes and disclaimers:

  • A certain familiarity with Windows administration and configuration is assumed. If you're new to Windows, this document is not for you. Sorry...
  • If you are asked a question or required to perform an action that you do not see described in these steps, go with the default option.
  • There are usually several ways to perform a specific configuration or administration task. What I describe here is one of those many ways. It's not necessarily the best way, just the one I personally like best at the moment.
  • For the most part, I use PowerShell to configure the systems. You can also use a graphical interface instead, but I did not describe those steps here.
  • The JBOD used must be certified for use with Storage Spaces. For a list of certified JBODs, check the Windows Server Catalog at http://www.windowsservercatalog.com/results.aspx?bCatID=1573&cpID=0&avc=10&OR=1
  • The Storage Spaces configuration shown here uses only HDDs. You could enhance it to use both SSDs and HDDs, in order to leverage new Storage Spaces features like Tiering or Write-back Cache.
  • The configuration uses three file servers to demonstrate the new SMB Scale-Out Balancing capabilities when adding a node to a two-node file server cluster. It uses only two Hyper-V nodes because that’s the minimum required for a fault-tolerant Shared VHDX deployment. Typical setups will use a two-node file server cluster and a larger number of Hyper-V cluster nodes.

 

3) Summarized instructions for experts

 

If you are already familiar with Failover Clustering, Scale-Out File Servers and Hyper-V, here’s a short introduction to what it takes to configure a Shared VHDX environment with a Scale-Out File Server. A detailed step-by-step is available in the following sections of this document.

  • Configure a three-node Scale-Out File Server Cluster using Windows Server 2012 R2 and Clustered Storage Spaces with Shared SAS disks.
  • Configure a Hyper-V Failover cluster using Windows Server 2012 R2 with SMB shared storage as you normally would.
  • Create an OS VHD or VHDX file as you would regularly, on an SMB file share. Create your VMs as you would regularly would, using the OS VHD or VHDX file. Both Generation 1 and Generation 2 VMs are supported.
  • Create your VHDX data files to be shared as fixed-size or dynamically expanding, on a Clustered Shared Volume. Old VHD files or differencing disks are not supported with Shared VHDX.
  • Add the shared VHDX data files to multiple VMs, using the Add-VMHardDiskDrive and the “-ShareVirtualDisk” option. If using the GUI, check the box for “Share virtual disk” when adding the VHDX data files to the VM.
  • Inside the VM, install Windows Server 2012 R2 or Windows Server 2012. Make sure to install the Hyper-V integration components.
  • Validate and configure the cluster inside the VMs as you normally would. The guest cluster can use classic cluster disks or Cluster Shared Volumes. Any cluster role should work as it would in a physical machine with shared SAS storage.

 

The rest of this document contains detailed step-by-step instructions for each of the items outlined above.

 

4) Configure JOSE-D (DNS, DC)

 

# Preparation steps: Install WS2012R2, rename computer
Rename-Computer -NewName JOSE-D -Restart

# Install required roles and features, restarts at the end
Install-WindowsFeature AD-Domain-Services, RSAT-ADDS, RSAT-ADDS-Tools

# Rename DHCP network adapter to Net1, private to Net2
Get-NetIPAddress -PrefixOrigin DHCP | Get-NetAdapter | Rename-NetAdapter -NewName Net1

# Configure Net2 with a static IP address for DNS / DC
Get-NetAdapter Eth* | ? Status -eq Up | ? InterfaceDescription -notmatch "Mellanox*" | Rename-NetAdapter -NewName Net2
Set-NetIPInterface -InterfaceAlias Net2 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias Net2 -Confirm:$false
New-NetIPAddress -InterfaceAlias Net2 -IPAddress 192.168.100.1 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias Net2 -ServerAddresses 192.168.100.1

# Configure NetR1 with a static IP address for the RDMA network
Get-NetAdapter -InterfaceDescription "*IPoIB*" | ? Status -eq Up | Rename-NetAdapter -NewName NetR1
Set-NetIPInterface -InterfaceAlias NetR1 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias NetR1 -Confirm:$false
New-NetIPAddress -InterfaceAlias NetR1 -IPAddress 192.168.101.1 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias NetR1 -ServerAddresses 192.168.100.1

# Disable all disconnected adapters
Get-NetAdapter -InterfaceDescription "*IPoIB*" | ? Status -ne Up | Rename-NetAdapter -NewName NetRX
Get-NetAdapter | ? Status -ne Up | Disable-NetAdapter -Confirm:$false

# Create AD forest, reboots at the end
Install-ADDSForest -CreateDNSDelegation:$false -DatabasePath "C:\Windows\NTDS" -DomainMode "Win2012" -DomainName "JOSE.TEST" -DomainNetBIOSName "JOSE" -ForestMode "Win2012" -InstallDNS:$true -LogPath "C:\Windows\NTDS" -SYSVOLPath "C:\Windows\SYSVOL"

 

 

5) Configure JOSE-A1 (File Server Cluster A, Node 1)

 

# Preparation steps: Install WS2012R2, rename computer
Rename-Computer -NewName JOSE-A1 -Restart

# Install required roles and features, restarts at the end
Install-WindowsFeature File-Services, FS-FileServer, Failover-Clustering
Install-WindowsFeature RSAT-Clustering -IncludeAllSubFeature

# Rename DHCP network adapter to Net1
Get-NetIPAddress -PrefixOrigin DHCP | Get-NetAdapter | Rename-NetAdapter -NewName Net1

# Configure Net2 with a static IP address for DNS / DC
Get-NetAdapter Eth* | ? Status -eq Up | ? InterfaceDescription -notmatch "Mellanox*" | Rename-NetAdapter -NewName Net2
Set-NetIPInterface -InterfaceAlias Net2 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias Net2 -Confirm:$false
New-NetIPAddress -InterfaceAlias Net2 -IPAddress 192.168.100.11 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias Net2 -ServerAddresses 192.168.100.1

# Configure NetR1 with a static IP address for the RDMA network
Get-NetAdapter -InterfaceDescription "*IPoIB*" | ? Status -eq Up | Rename-NetAdapter -NewName NetR1
Set-NetIPInterface -InterfaceAlias NetR1 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias NetR1 -Confirm:$false
New-NetIPAddress -InterfaceAlias NetR1 -IPAddress 192.168.101.11 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias NetR1 -ServerAddresses 192.168.100.1

# Disable all disconnected adapters
Get-NetAdapter -InterfaceDescription "*IPoIB*" | ? Status -ne Up | Rename-NetAdapter -NewName NetRX
Get-NetAdapter | ? Status -ne Up | Disable-NetAdapter -Confirm:$false

# Configure Storage Spaces. Assumes JBOD with 4 x 250GB HDDs = 1TB total raw. 2 mirror columns. 80GB spaces.
# Adjust the size of the volumes and the number of mirror columns based on the actual number of HDDs you have.

$s = Get-StorageSubSystem -FriendlyName *Spaces*
New-StoragePool -FriendlyName Pool1 -StorageSubSystemFriendlyName $s.FriendlyName -PhysicalDisks (Get-PhysicalDisk -CanPool $true)
Set-ResiliencySetting -Name Mirror -NumberofColumnsDefault 2 -StoragePool (Get-StoragePool -FriendlyName Pool1)

New-VirtualDisk -FriendlyName Space1 -StoragePoolFriendlyName Pool1 -ResiliencySettingName Mirror –Size 1GB
2..7 | % { New-VirtualDisk -FriendlyName Space$_ -StoragePoolFriendlyName Pool1 -ResiliencySettingName Mirror –Size 80GB }

1..7 | % {
   $Letter ="PQRSTUV”[($_-1)]
$Number = (Get-VirtualDisk -FriendlyName Space$_ | Get-Disk).Number
Set-Disk -Number $Number -IsReadOnly 0
   Set-Disk -Number $Number -IsOffline 0
Initialize-Disk -Number $Number -PartitionStyle MBR
   New-Partition -DiskNumber $Number -DriveLetter $Letter -UseMaximumSize 
   Initialize-Volume -DriveLetter $Letter -FileSystem NTFS -Confirm:$false
}

# Join domain, restart the machine
Add-Computer -DomainName JOSE.TEST -Credential (Get-Credential) -Restart

 

 

6) Configure JOSE-A2 (File Server Cluster A, Node 2)

 

# Preparation steps: Install WS2012R2, rename computer
Rename-Computer -NewName JOSE-A2 -Restart

# Install required roles and features, restarts at the end
Install-WindowsFeature File-Services, FS-FileServer, Failover-Clustering
Install-WindowsFeature RSAT-Clustering -IncludeAllSubFeature

# Rename DHCP network adapter to Net1
Get-NetIPAddress -PrefixOrigin DHCP | Get-NetAdapter | Rename-NetAdapter -NewName Net1

# Configure Net2 with a static IP address for DNS / DC
Get-NetAdapter Eth* | ? Status -eq Up | ? InterfaceDescription -notmatch "Mellanox*" | Rename-NetAdapter -NewName Net2
Set-NetIPInterface -InterfaceAlias Net2 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias Net2 -Confirm:$false
New-NetIPAddress -InterfaceAlias Net2 -IPAddress 192.168.100.12 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias Net2 -ServerAddresses 192.168.100.1

# Configure NetR1 with a static IP address for the RDMA network
Get-NetAdapter -InterfaceDescription "*IPoIB*" | ? Status -eq Up | Rename-NetAdapter -NewName NetR1
Set-NetIPInterface -InterfaceAlias NetR1 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias NetR1 -Confirm:$false
New-NetIPAddress -InterfaceAlias NetR1 -IPAddress 192.168.101.12 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias NetR1 -ServerAddresses 192.168.100.1

# Disable all disconnected adapters
Get-NetAdapter -InterfaceDescription "*IPoIB*" | ? Status -ne Up | Rename-NetAdapter -NewName NetRX
Get-NetAdapter | ? Status -ne Up | Disable-NetAdapter -Confirm:$false

# Join domain, restart the machine
Add-Computer -DomainName JOSE.TEST -Credential (Get-Credential) –Restart

 

 

7) Configure JOSE-A3 (File Server Cluster A, Node 3)

 

# Preparation steps: Install WS2012R2, rename computer
Rename-Computer -NewName JOSE-A3 -Restart

# Install required roles and features, restarts at the end
Install-WindowsFeature File-Services, FS-FileServer, Failover-Clustering
Install-WindowsFeature RSAT-Clustering -IncludeAllSubFeature

# Rename DHCP network adapter to Net1
Get-NetIPAddress -PrefixOrigin DHCP | Get-NetAdapter | Rename-NetAdapter -NewName Net1

# Configure Net2 with a static IP address for DNS / DC
Get-NetAdapter Eth* | ? Status -eq Up | ? InterfaceDescription -notmatch "Mellanox*" | Rename-NetAdapter -NewName Net2
Set-NetIPInterface -InterfaceAlias Net2 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias Net2 -Confirm:$false
New-NetIPAddress -InterfaceAlias Net2 -IPAddress 192.168.100.13 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias Net2 -ServerAddresses 192.168.100.1

# Configure NetR1 with a static IP address for the RDMA network
Get-NetAdapter -InterfaceDescription "*IPoIB*" | ? Status -eq Up | Rename-NetAdapter -NewName NetR1
Set-NetIPInterface -InterfaceAlias NetR1 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias NetR1 -Confirm:$false
New-NetIPAddress -InterfaceAlias NetR1 -IPAddress 192.168.101.13 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias NetR1 -ServerAddresses 192.168.100.1

# Disable all disconnected adapters
Get-NetAdapter -InterfaceDescription "*IPoIB*" | ? Status -ne Up | Rename-NetAdapter -NewName NetRX
Get-NetAdapter | ? Status -ne Up | Disable-NetAdapter -Confirm:$false

# Join domain, restart the machine
Add-Computer -DomainName JOSE.TEST -Credential (Get-Credential) -Restart

 

8) Configure File Server Cluster JOSE-A

 

# Validate cluster
Test-Cluster -Node JOSE-A1, JOSE-A2, JOSE-A3

# Create cluster
New-Cluster –Name JOSE-A -Node JOSE-A1, JOSE-A2, JOSE-A3

# Rename and configure networks
(Get-ClusterNetwork | ? {$_.Address -notlike "192.*" }).Name = "Net1"
(Get-ClusterNetwork | ? {$_.Address -like "192.168.100.*" }).Name = "Net2"
(Get-ClusterNetwork | ? {$_.Address -like "192.168.101.*" }).Name = "NetR1"
(Get-ClusterNetwork Net1).Role = 1
(Get-ClusterNetwork Net2).Role = 3
(Get-ClusterNetwork NetR1).Role = 3
(Get-Cluster).UseClientAccessNetworksForSharedVolumes=1

# Remove default DHCP-based IP address for JOSE-A and add 2 IP addresses on 100/101 networks
Stop-ClusterResource "Cluster Name"
Get-ClusterResource | ? { (($_.ResourceType -like "*Address*") -and ($_.OwnerGroup -eq "Cluster Group")) } | Remove-ClusterResource –Force

Add-ClusterResource -Name "Cluster IP Address 100" -Group "Cluster Group" -ResourceType "IP Address"
Get-ClusterResource –Name "Cluster IP Address 100" | Set-ClusterParameter -Multiple @{ “Network”=”Net2”; "Address"="192.168.100.19"; ”SubnetMask”=”255.255.255.0”; "EnableDhcp"=0 }
Get-ClusterResource “Cluster Name” | Add-ClusterResourceDependency –Resource "Cluster IP Address 100"

Add-ClusterResource -Name "Cluster IP Address 101" -Group "Cluster Group" -ResourceType "IP Address"
Get-ClusterResource –Name "Cluster IP Address 101" | Set-ClusterParameter -Multiple @{ “Network”=”NetR1”; "Address"="192.168.101.19"; ”SubnetMask”=”255.255.255.0”; "EnableDhcp"=0 }
Get-ClusterResource “Cluster Name” | Add-ClusterResourceDependency –Resource "Cluster IP Address 101"

Set-ClusterResourceDependency -Resource "Cluster Name" -Dependency "[Cluster IP Address 100] OR [Cluster IP Address 101] "
Start-ClusterResource "Cluster Name"

# Rename Witness Disk
$w = Get-ClusterResource | ? { $_.OwnerGroup -eq "Cluster Group" -and $_.ResourceType -eq "Physical Disk"}
$w.Name = "WitnessDisk"

# Add remaining disks to Cluster Shared Volumes
Get-ClusterResource | ? OwnerGroup -eq "Available Storage" | Add-ClusterSharedVolume

# Create Scale-Out File Server
Add-ClusterScaleOutFileServerRole -Name JOSE-F

 

9) Configure JOSE-B1 (Hyper-V Host Cluster B, Node 1)

 

# Preparation steps: Install WS2012R2, rename computer
Rename-Computer -NewName JOSE-B1 -Restart

# Install required roles and features, restarts at the end
Install-WindowsFeature Failover-Clustering
Install-WindowsFeature RSAT-Clustering -IncludeAllSubFeature
Install-WindowsFeature Hyper-V, Hyper-V-PowerShell, Hyper-V-Tools -Restart

# Rename DHCP network adapter to Net1
Get-NetIPAddress -PrefixOrigin DHCP | Get-NetAdapter | Rename-NetAdapter -NewName Net1
New-VMSwitch -NetAdapterName Net1 -Name Net1
Rename-NetAdapter -Name "vEthernet (Net1)" -NewName VNet1

# Configure Net2 with a static IP address for DNS / DC
Get-NetAdapter Eth* | ? Status -eq Up | ? InterfaceDescription -notmatch "Mellanox*" | Rename-NetAdapter -NewName Net2
New-VMSwitch -NetAdapterName Net2 -Name Net2
Rename-NetAdapter -Name "vEthernet (Net2)" -NewName VNet2
Set-NetIPInterface -InterfaceAlias VNet2 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias VNet2 -Confirm:$false
New-NetIPAddress -InterfaceAlias VNet2 -IPAddress 192.168.100.21 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias VNet2 -ServerAddresses 192.168.100.1

# Configure NetR1 with a static IP address for the RDMA network
Get-NetAdapter -InterfaceDescription "*IPoIB*" | ? Status -eq Up | Rename-NetAdapter -NewName NetR1
Set-NetIPInterface -InterfaceAlias NetR1 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias NetR1 -Confirm:$false
New-NetIPAddress -InterfaceAlias NetR1 -IPAddress 192.168.101.21 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias NetR1 -ServerAddresses 192.168.100.1

# Disable all disconnected adapters
Get-NetAdapter -InterfaceDescription "*IPoIB*" | ? Status -ne Up | Rename-NetAdapter -NewName NetRX
Get-NetAdapter | ? Status -ne Up | Disable-NetAdapter -Confirm:$false

# Join domain, restart the machine
Add-Computer -DomainName JOSE.TEST -Credential (Get-Credential) -Restart

 

 

10) Configure JOSE-B2 (Hyper-V Host Cluster B, Node 2)

 

# Preparation steps: Install WS2012R2, rename computer
Rename-Computer -NewName JOSE-B2 -Restart

# Install required roles and features, restarts at the end
Install-WindowsFeature Failover-Clustering
Install-WindowsFeature RSAT-Clustering -IncludeAllSubFeature
Install-WindowsFeature Hyper-V, Hyper-V-PowerShell, Hyper-V-Tools -Restart

# Rename DHCP network adapter to Net1
Get-NetIPAddress -PrefixOrigin DHCP | Get-NetAdapter | Rename-NetAdapter -NewName Net1
New-VMSwitch -NetAdapterName Net1 -Name Net1
Rename-NetAdapter -Name "vEthernet (Net1)" -NewName VNet1

# Configure Net2 with a static IP address for DNS / DC
Get-NetAdapter Eth* | ? Status -eq Up | ? InterfaceDescription -notmatch "Mellanox*" | Rename-NetAdapter -NewName Net2
New-VMSwitch -NetAdapterName Net2 -Name Net2
Rename-NetAdapter -Name "vEthernet (Net2)" -NewName VNet2
Set-NetIPInterface -InterfaceAlias VNet2 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias VNet2 -Confirm:$false
New-NetIPAddress -InterfaceAlias VNet2 -IPAddress 192.168.100.22 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias VNet2 -ServerAddresses 192.168.100.1

# Configure NetR1 with a static IP address for the RDMA network
Get-NetAdapter -InterfaceDescription "*IPoIB*" | ? Status -eq Up | Rename-NetAdapter -NewName NetR1
Set-NetIPInterface -InterfaceAlias NetR1 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias NetR1 -Confirm:$false
New-NetIPAddress -InterfaceAlias NetR1 -IPAddress 192.168.101.22 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias NetR1 -ServerAddresses 192.168.100.1

# Disable all disconnected adapters
Get-NetAdapter -InterfaceDescription "*IPoIB*" | ? Status -ne Up | Rename-NetAdapter -NewName NetRX
Get-NetAdapter | ? Status -ne Up | Disable-NetAdapter -Confirm:$false

# Join domain, restart the machine
Add-Computer -DomainName JOSE.TEST -Credential (Get-Credential) -Restart

 

 

11) Configure Hyper-V Host Cluster JOSE-B

 

# Validate cluster
Test-Cluster -Node JOSE-B1, JOSE-B2

# Create cluster
New-Cluster –Name JOSE-B -Node JOSE-B1, JOSE-B2
# Rename and configure networks
(Get-ClusterNetwork | ? {$_.Address -notlike "192.*" }).Name = "Net1"
(Get-ClusterNetwork | ? {$_.Address -like "192.168.100.*" }).Name = "Net2"
(Get-ClusterNetwork | ? {$_.Address -like "192.168.101.*" }).Name = "NetR1"
(Get-ClusterNetwork Net1).Role = 1
(Get-ClusterNetwork Net2).Role = 3
(Get-ClusterNetwork NetR1).Role = 3
(Get-Cluster).UseClientAccessNetworksForSharedVolumes=1

# Remove default DHCP-based IP address for JOSE-A and add 2 IP addresses on 100/101 networks
Stop-ClusterResource "Cluster Name"
Get-ClusterResource | ? { (($_.ResourceType -like "*Address*") -and ($_.OwnerGroup -eq "Cluster Group")) } | Remove-ClusterResource –Force

Add-ClusterResource -Name "Cluster IP Address 100" -Group "Cluster Group" -ResourceType "IP Address"
Get-ClusterResource –Name "Cluster IP Address 100" | Set-ClusterParameter -Multiple @{ “Network”=”Net2”; "Address"="192.168.100.29"; ”SubnetMask”=”255.255.255.0”; "EnableDhcp"=0 }
Get-ClusterResource “Cluster Name” | Add-ClusterResourceDependency –Resource "Cluster IP Address 100"

Add-ClusterResource -Name "Cluster IP Address 101" -Group "Cluster Group" -ResourceType "IP Address"
Get-ClusterResource –Name "Cluster IP Address 101" | Set-ClusterParameter -Multiple @{ “Network”=”NetR1”; "Address"="192.168.101.29"; ”SubnetMask”=”255.255.255.0”; "EnableDhcp"=0 }
Get-ClusterResource “Cluster Name” | Add-ClusterResourceDependency –Resource "Cluster IP Address 101"

Set-ClusterResourceDependency -Resource "Cluster Name" -Dependency "[Cluster IP Address 100] OR [Cluster IP Address 101] "
Start-ClusterResource "Cluster Name"

# Create Share for VMs (run from JOSE-A1)
1..6 | % {
MD C:\ClusterStorage\Volume$_\Share
New-SmbShare -Name SHARE$_ -Path C:\ClusterStorage\Volume$_\Share -FullAccess JOSE.Test\Administrator, JOSE.Test\JOSE-B1$, JOSE.Test\JOSE-B2$, JOSE.Test\JOSE-B$
Set-SmbPathAcl -ShareName SHARE$_
}

# Create Share for File Share Witness (run from JOSE-A1)
MD C:\ClusterStorage\Volume6\Witness
New-SmbShare -Name Witness -Path C:\ClusterStorage\Volume6\Witness -FullAccess JOSE.Test\Administrator, JOSE.Test\JOSE-B$
Set-SmbPathAcl -ShareName Witness

# Configure JOSE-B Cluster with a File Share Witness (run from JOSE-B1)
Set-ClusterQuorum -NodeAndFileShareMajority \\JOSE-F.JOSE.TEST\Witness

 

12) Configure VMs on the Hyper-V Host Cluster JOSE-B

 

# Create VHD files for VMs (run on JOSE-B1)
New-VHD -Path \\JOSE-F.JOSE.TEST\Share1\VM1OS.VHDX -Fixed -SizeBytes 40GB
New-VHD -Path \\JOSE-F.JOSE.TEST\Share2\VM2OS.VHDX -Fixed -SizeBytes 40GB
New-VHD -Path \\JOSE-F.JOSE.TEST\Share3\VM12Witness.VHDX -Fixed -SizeBytes 1GB
New-VHD -Path \\JOSE-F.JOSE.TEST\Share3\VM12Data.VHDX -Fixed -SizeBytes 10GB

# Create VM1 (run on JOSE-B1)
New-VM -Path \\JOSE-F.JOSE.TEST\Share1 -Name VM1 -VHDPath \\JOSE-F.JOSE.TEST\Share1\VM1OS.VHDX -SwitchName Net1 -Memory 2GB
Get-VMProcessor * | Set-VMProcessor -CompatibilityForMigrationEnabled 1
Add-VMNetworkAdapter -VMName VM1 -SwitchName Net2
Add-VMHardDiskDrive -VMName VM1 -Path \\JOSE-F.JOSE.TEST\Share3\VM12Witness.VHDX-ShareVirtualDisk
Add-VMHardDiskDrive -VMName VM1 -Path \\JOSE-F.JOSE.TEST\Share3\VM12Data.VHDX-ShareVirtualDisk
Set-VMDvdDrive -VMName VM1 -Path D:\WindowsServer2012.ISO
Start-VM VM1
Add-VMToCluster VM1

# Create VM2 (run on JOSE-B2)
New-VM -Path \\JOSE-F.JOSE.TEST\Share2 -Name VM2 -VHDPath \\JOSE-F.JOSE.TEST\Share2\VM2OS.VHDX -SwitchName Net1 -Memory 2GB
Get-VMProcessor * | Set-VMProcessor -CompatibilityForMigrationEnabled 1
Add-VMNetworkAdapter -VMName VM2 -SwitchName Net2
Add-VMHardDiskDrive -VMName VM2 -Path \\JOSE-F.JOSE.TEST\Share3\VM12Witness.VHDX-ShareVirtualDisk
Add-VMHardDiskDrive -VMName VM2 -Path \\JOSE-F.JOSE.TEST\Share3\VM12Data.VHDX-ShareVirtualDisk
Set-VMDvdDrive -VMName VM2 -Path D:\WindowsServer2012.ISO
Start-VM VM2
Add-VMToCluster VM2

 

13) Configure JOSE-X1 (SQL Server Guest Cluster X, Node 1)

 

# Preparation steps: Install WS2012R2, rename computer, install Hyper-V IC
Rename-Computer -NewName JOSE-X1 -Restart

# Install required roles and features, restarts at the end
Install-WindowsFeature File-Services, FS-FileServer, Failover-Clustering
Install-WindowsFeature RSAT-Clustering -IncludeAllSubFeature

# Rename the two virtual ports as Net1 and Net2
Get-NetIPAddress -PrefixOrigin DHCP | Get-NetAdapter | Rename-NetAdapter -NewName Net1
Get-NetAdapter Ethernet* | Rename-NetAdapter -NewName Net2

# Configure Net2 with a static IP address for DNS / DC
Set-NetIPInterface -InterfaceAlias Net2 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias Net2 -Confirm:$false
New-NetIPAddress -InterfaceAlias Net2 -IPAddress 192.168.100.41 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias Net2 -ServerAddresses 192.168.100.1

# Configure 2 shared disks
1..2 | % { 
    $Letter ="JK"[($_-1)]
    Set-Disk -Number $_ -IsReadOnly 0
    Set-Disk -Number $_ -IsOffline 0
    Initialize-Disk -Number $_ -PartitionStyle MBR
    New-Partition -DiskNumber $_ -DriveLetter $Letter -UseMaximumSize 
    Initialize-Volume -DriveLetter $Letter -FileSystem NTFS -Confirm:$false
}

# Join domain, restart the machine
Add-Computer -DomainName JOSE.TEST -Credential (Get-Credential) –Restart

 

14) Configure JOSE-X2 (SQL Server Guest Cluster X, Node 2)

 

# Preparation steps: Install WS2012R2, rename computer, install Hyper-V IC
Rename-Computer -NewName JOSE-X2 -Restart

# Install required roles and features, restarts at the end
Install-WindowsFeature File-Services, FS-FileServer, Failover-Clustering
Install-WindowsFeature RSAT-Clustering -IncludeAllSubFeature

# Rename the two virtual ports as Net1 and Net2
Get-NetIPAddress -PrefixOrigin DHCP | Get-NetAdapter | Rename-NetAdapter -NewName Net1
Get-NetAdapter Ethernet* | Rename-NetAdapter -NewName Net2

# Configure Net2 with a static IP address for DNS / DC
Set-NetIPInterface -InterfaceAlias Net2 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias Net2 -Confirm:$false
New-NetIPAddress -InterfaceAlias Net2 -IPAddress 192.168.100.42 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias Net2 -ServerAddresses 192.168.100.1

# Shared Disks already configured on the first node

# Join domain, restart the machine
Add-Computer -DomainName JOSE.TEST -Credential (Get-Credential) –Restart

 

15) Configure SQL Server Guest Cluster JOSE-X

 

# Validate cluster
Test-Cluster -Node JOSE-X1, JOSE-X2

# Create cluster
New-Cluster –Name JOSE-X -Node JOSE-X1, JOSE-X2
# Rename and configure networks
(Get-ClusterNetwork | ? {$_.Address -notlike "192.*" }).Name = "Net1"
(Get-ClusterNetwork | ? {$_.Address -like "192.168.100.*" }).Name = "Net2"
(Get-ClusterNetwork Net1).Role = 1
(Get-ClusterNetwork Net2).Role = 3

# Remove default DHCP-based IP address for JOSE-V and add IP address on 100 network
Stop-ClusterResource "Cluster Name"
Get-ClusterResource | ? { (($_.ResourceType -like "*Address*") -and ($_.OwnerGroup -eq "Cluster Group")) } | Remove-ClusterResource –Force

Add-ClusterResource -Name "Cluster IP Address" -Group "Cluster Group" -ResourceType "IP Address"
Get-ClusterResource –Name "Cluster IP Address" | Set-ClusterParameter -Multiple @{ “Network”=”Net2”; "Address"="192.168.100.49"; ”SubnetMask”=”255.255.255.0”; "EnableDhcp"=0 }
Get-ClusterResource “Cluster Name” | Add-ClusterResourceDependency –Resource "Cluster IP Address"

Start-ClusterResource "Cluster Name"

# Rename Witness Disk
$w = Get-ClusterResource | ? { $_.OwnerGroup -eq "Cluster Group" -and $_.ResourceType -eq "Physical Disk"}
$w.Name = "WitnessDisk"

# Install SQL Server on nodes X1 and X2
# Use IP address 192.168.100.48 for the SQL Server group

 

 

 

16) Configure JOSE-V (VMM Server)

 

# Preparation steps: Install WS2012R2, rename computer
Rename-Computer -NewName JOSE-V -Restart

# Install required roles and features, restarts at the end
Install-WindowsFeature File-Services, FS-FileServer, Failover-Clustering
Install-WindowsFeature RSAT-Clustering -IncludeAllSubFeature

# Rename DHCP network adapter to Net1
Get-NetIPAddress -PrefixOrigin DHCP | Get-NetAdapter | Rename-NetAdapter -NewName Net1

# Configure Net2 with a static IP address for DNS / DC
Get-NetAdapter Eth* | ? Status -eq Up | ? InterfaceDescription -notmatch "Mellanox*" | Rename-NetAdapter -NewName Net2
Set-NetIPInterface -InterfaceAlias Net2 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias Net2 -Confirm:$false
New-NetIPAddress -InterfaceAlias Net2 -IPAddress 192.168.100.31 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias Net2 -ServerAddresses 192.168.100.1

# Configure NetR1 with a static IP address for the RDMA network
Get-NetAdapter -InterfaceDescription "*IPoIB*" | ? Status -eq Up | Rename-NetAdapter -NewName NetR1
Set-NetIPInterface -InterfaceAlias NetR1 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias NetR1 -Confirm:$false
New-NetIPAddress -InterfaceAlias NetR1 -IPAddress 192.168.101.31 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias NetR1 -ServerAddresses 192.168.100.1

# Disable all disconnected adapters
Get-NetAdapter -InterfaceDescription "*IPoIB*" | ? Status -ne Up | Rename-NetAdapter -NewName NetRX
Get-NetAdapter | ? Status -ne Up | Disable-NetAdapter -Confirm:$false

# Join domain, restart the machine
Add-Computer -DomainName JOSE.TEST -Credential (Get-Credential) –Restart

# Install .NET Framework
Install-WindowsFeature NET-Framework-Core

# Install Windows Server 2012 R2 ADK
D:\adk\adksetup.exe
# Select only the “Deployment Tools” and “Windows PE” options

# Install SQL Server 2012 SP1
# New SQL Server standalone installation, Feature Selection: Database Engine Services

# Install VMM 2012 R2
# Features selected to be added: VMM management server, VMM console
# Database: VirtualManagerDB database will be created on JOSE-V
# Service Account: Local System account

 

17) Running the Scale-Out demo (from JOSE-V)

 

# Prepare - Map the SMB shares
1..6 | % { 
    $d ="PQRSTU"[($_-1)] + “:”
New-SmbMapping -LocalPath $d -RemotePath \\JOSE-f\Share$_ -Persistent $true
}

# Prepare - Create the test files
1..6 | % { 
   $f ="PQRSTU"[($_-1)] + “:\testfile.dat”
fsutil file createnew $f (5GB)
   fsutil file setvaliddata $f (5GB)
}

# Remove the third cluster node (demo starts with 2 nodes)
Remove-ClusterNode -Cluster JOSE-A -Name JOSE-A3

 

Start Performance Monitor

Start a performance monitor session

Switch to “Histogram Bar” view to show the performance side-by-side

Add a counter for SMB Server Shares, Data bytes/sec, _total instance for JOSE-A1, JOSE-A2 and JOSE-A3.

clip_image003

 

Query the cluster shared volume ownership on Cluster A, with 2 nodes

Get-ClusterSharedVolume -Cluster JOSE-A | Sort OwnerNode | FT OwnerNode, Name, State -AutoSize

OwnerNode Name           State
--------- ----           -----
JOSE-A1   Cluster Disk 6 Online
JOSE-A1   Cluster Disk 3 Online
JOSE-A1   Cluster Disk 4 Online
JOSE-A2   Cluster Disk 7 Online
JOSE-A2   Cluster Disk 2 Online
JOSE-A2   Cluster Disk 5 Online

 

Run SQLIO to issue 8KB IOs

C:\sqlio\sqlio.exe -s9999 -T100 -t1 -o16 -b8 -BN -LS -c2000 -frandom -dPQRSTU testfile.dat

clip_image001[7]

 

Add a 3rd node and wait for it to take 1/3 of the load

Add-ClusterNode -Cluster JOSE-A -Name JOSE-A3

# Wait 2 and a half minutes to transition to the following state.

clip_image002[9]

 

 

Re- query the cluster shared volume ownership on Cluster A, now with 3 nodes

Get-ClusterSharedVolume -Cluster JOSE-A | Sort OwnerNode | FT OwnerNode, Name, State -AutoSize

OwnerNode Name           State
--------- ----           -----
JOSE-A1   Cluster Disk 3 Online
JOSE-A1   Cluster Disk 6 Online
JOSE-A2   Cluster Disk 5 Online
JOSE-A2   Cluster Disk 7 Online
JOSE-A3   Cluster Disk 2 Online
JOSE-A3   Cluster Disk 4 Online

 

18) Configuring VMM on JOSE-V

 

Bring the File Server Cluster under VMM management

Select Fabric and use the option to add Storage Devices

clip_image001

Add a Windows-based file server

clip_image002[7]

Specify the full path to the file server cluster:

clip_image003[11]

 

Verify the File Server Cluster was properly discovered by VMM

Check the provider

clip_image004

Check the Storage Spaces discovery

clip_image005

Check the Scale-Out File Server and file share discovery

clip_image006

 

 

Remove the File Server Cluster node (demo starts with 2 nodes)

Under the properties of the File Server Cluster, remove node 3

clip_image007

Check progress under running Jobs

clip_image008[6]

 

 

While running a workload, add a File Server Cluster node

Under the properties of the File Server Cluster, add node 3 specifying the full path of to the server

clip_image009

Check progress under running Jobs

clip_image010

 

 

19) Verifying systems’ configuration

 

# Commands to verify the configuration of all systems

"B1", "B2" | % { Get-VM -ComputerName JOSE-$_ }

"B1", "B2" | % { $_; Get-VM -ComputerName JOSE-$_ | Get-VMHardDiskDrive | FT VMName, ControllerType, ControllerLocation, Path -AutoSize}

Get-SmbOpenFile -CimSession JOSE-A1 | Sort ClientUserName, ShareRelativePath | FT ClientUserName, ShareRelativePath –AutoSize

"X1", "X2" | % { $_; Get-Disk -CimSession JOSE-$_ } | FT -AutoSize

"A", "B", "X" | % { $_; Get-ClusterNode -Cluster JOSE-$_ | FT Cluster, NodeName, State, Id -AutoSize }

"A", "B", "X" | % { $_; Get-ClusterResource -Cluster JOSE-$_ | FT -AutoSize}

 

20) Issues and FAQs (Frequently asked questions)

 

Failover Cluster creation in the guest fails.

  • Make sure you’re logged on as a domain user, not a local user
  • Make sure the Windows Server 2012 R2 integration components are installed in the guest.
  • Check for individual warnings and errors in the Failover Clustering validation report.

 

I can’t add a shared VHDX to a VM. I get a message saying that “the storage where the virtual hard disk is located does not support virtual hard disk sharing.”

  • Make sure your using a CSV disk or an SMB Scale-Out file share to store your VHDX files
  • Make sure the SMB Scale-Out file server cluster is running Windows Server 2012 R2

 

21) Final Notes

 

  • Keep in mind that there are dependencies between the services running on each computer.
  • To shut them down, start with VMM server, then the Hyper-V servers, then the File Server and finally the DNS/DC, waiting for each one to go down completely before moving to the next one.
  • To bring the servers up again, go from the DNS/DC to the File Server Cluster, then the Hyper-V Cluster and finally the VMM Server, waiting for the previous one to be fully up before starting the next one.
  • I hope you enjoyed these step-by-step instructions. I strongly encourage you to try them out and perform the entire installation yourself. It’s a good learning experience.
  • Let me know how these steps worked for you using the comment section. If you run into any issues or found anything particularly interesting, don’t forget to mention the number of the step.

 

To see all of the posts in this series, check out the What’s New in Windows Server & System Center 2012 R2 archive.

Windows Server 2012 R2 Storage: Step-by-step with Storage Spaces, SMB Scale-Out and Shared VHDX (Virtual)

$
0
0

This post is a part of the nine-part “What’s New in Windows Server & System Center 2012 R2” series that is featured on Brad Anderson’s In the Cloud blog.  Today’s blog post covers Windows Server 2012 R2 Storage and how it applies to the larger topic of “Transform the Datacenter.”  To read that post and see the other technologies discussed, read today’s post: “What’s New in 2012 R2: IaaS Innovations.”

  

1) Overview

 

In this document, I am sharing all the steps I used to create a Windows Server 2012 R2 File Server demo or test environment, so you can experiment with some of the new technologies yourself. You only need a single computer (the specs are provided below) and the ISO file with the Windows Server 2012 R2 Preview available as a free download.

 

The demo setup includes 5 virtual machines: 1 domain controller, 3 file servers and 1 file client/VMM Server. It uses the new Shared VHDX feature to provide shared storage for the guest failover cluster and it showcases both Storage Spaces and Scale-Out File Servers.

 

If you're not familiar with these technologies, I would strong encourage you to review the TechEd 2013 talks on Hyper-V over SMB (Understanding the Hyper-V over SMB Scenario, Configurations, and End-to-End Performance) and Shared VHDX (second half of Application Availability Strategies for the Private Cloud).

 

Here’s a diagram of the setup:

clip_image002

 

Here are the details about the names, roles and IP addresses for each of the computers involved:

VMComputerRoleNet2 (DC/DNS)
HostJOSE-EWHyper-V Host192.168.100.99/24
VM1JOSE-DDNS, DC192.168.101.100/24
VM2JOSE-A1Cluster Node 1192.168.102.101/24
VM3JOSE-A2Cluster Node 2192.168.100.102/24
VM4JOSE-A3Cluster Node 3192.168.100.103/24
VM5JOSE-VVMM / Client192.168.100.104/24
 JOSE-ACluster192.168.100.110/24
 JOSE-FScale-Out​ File ServerN/A

 

Following these steps will probably require a few hours of work end-to-end, but it is a great way to experiment with a large set of Microsoft technologies in or related to Windows Server 2012 R2, including:

  • Hyper-V
  • Networking
  • Domain Name Services (DNS)
  • Active Directory Domain Services (AD-DS)
  • Shared VHDX
  • Storage Spaces
  • Failover Clustering
  • File Servers
  • PowerShell
  • SQL Server
  • Virtual Machine Manager

 

2) Required Hardware and Software, plus Disclaimers

 

You will need the following hardware to perform the steps described here:

 

You will need the following software to perform the steps described here:

 

Notes and disclaimers:

  • A certain familiarity with Windows administration and configuration is assumed. If you're new to Windows, this document is not for you. Sorry...
  • If you are asked a question or required to perform an action that you do not see described in these steps, go with the default option.
  • There are usually several ways to perform a specific configuration or administration task. What I describe here is one of those many ways. It's not necessarily the best way, just the one I personally like best at the moment.
  • For the most part, I use PowerShell to configure the systems. You can also use a graphical interface instead, but I did not describe those steps here.
  • Obviously, a single-computer solution can never be tolerant to the failure of that computer. So, the configuration described here is not really continuously available. It’s just a simulation.
  • The specific Shared VHDX configuration shown in this blog post is not supported. Microsoft Support will only answer questions and assist in troubleshooting configurations where the Shared VHDX files are stored either on a Cluster Shared Volume (CSV) directly or on a file share on a Scale-Out File Server Cluster.
  • The specific Storage Spaces configuration shown in this blog post is not supported. Microsoft Support will only answer questions and assist in troubleshooting configurations where Storage Spaces uses a physical machine (not a VM) and uses one of the certified JBOD hardware solutions (see http://www.windowsservercatalog.com/results.aspx?bCatID=1573&cpID=0&avc=10&OR=1)
  • Because of the two items above, the configuration described here should only be used for demos, testing or learning environments.
  • The Storage Spaces configuration shown here is not capable of showcasing some of the new features like Tiering or Write-back Cache, since it uses virtual disks without a proper media type. To use those features, you’ll need a storage pool with physical SSDs and HDDs.

 

3) Summarized instructions for experts

 

If you are already familiar with Failover Clustering, Scale-Out File Servers and Hyper-V, here’s a short introduction to what it takes to configure this environment on a single computer. A detailed step-by-step is available in the following sections of this document.

  • Configure a Hyper-V capable computer with at least 8GB of RAM with Windows Server 2012 R2. Make sure to load the Failover Clustering feature. No need to actually create a failover cluster on the physical machine.
  • Manually attach the Shared VHDX filter to the disk where the VHDX files will be located:
    fltmc.exe attach svhdxflt D:\
  • Create OS VHD or VHDX files as you would regularly, on an SMB file share. Create your VMs as you would regularly would, using the OS VHD or VHDX file. Both Generation 1 and Generation 2 VMs are fine.
  • Create your VHDX data files to be shared as fixed-size or dynamically expanding, on the disk where you manually attached the Shared VHDX filter. Old VHD files are not allowed. Differencing disks are not allowed.
  • Add the shared VHDX data files to multiple VMs, using the Add-VMHardDiskDrive and the “-ShareVirtualDisk” option. If using the GUI, check the box for “Share virtual disk” when adding the VHDX data files to the VM.
  • Inside the VM, install Windows Server 2012 R2 or Windows Server 2012. Make sure to install the Hyper-V integration components.
  • Configure Storage Spaces using the Virtual SAS disks exposed to the VMs.
  • Validate and configure the cluster inside the VMs as you normally would. Configure a Scale-Out file server and create the file shares for testing.
  • Install VMM 2012 R2 Preview and use it to manage the Scale-Out File Server.

 

The rest of this document contains detailed step-by-step instructions for each of the items outlined above.

 

4) Configure the physical host

 

# Preparation steps: Install WS2012R2, rename computer, enable remote desktop

# Install required roles and features, restart at the end
Install-WindowsFeature Failover-Clustering -IncludeManagementTools
Install-WindowsFeature Hyper-V -IncludeManagementTools -Restart

# Create a new Internal Switch
New-VMSwitch -Name Net2 -SwitchType Internal
Get-NetAdapter *Net2* | Rename-NetAdapter -NewName vNet2

# Configure vNet2 (parent port for internal switch) with a static IP address for DNS / DC
Set-NetIPInterface -InterfaceAlias vNet2 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias vNet2 -Confirm:$false
New-NetIPAddress -InterfaceAlias vNet2 -IPAddress 192.168.100.1 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias vNet2 -ServerAddresses 192.168.100.1

# Create 5 VMs, assumes there is a base VM ready at D:\VMS\BaseOS.VHDX
1..5 | % { New-VHD -Path D:\VMS\VM$_.VHDX –ParentPath D:\VMS\BaseOS.VHDX }
1..5 | % { New-VM -Name VM$_ -Path D:\VMS –VHDPath D:\VMS\VM$_.VHDX -Memory 2GB -SwitchName Net2 }
# Give the VMM VM a little more RAM
Set-VM -VMName VM5 -MemoryStartupBytes 4GB

# Create 3 data VHDX files
1..3 | % { New-VHD -Path D:\VMS\Data$_.VHDX -Fixed -SizeBytes 20GB }

# Manually attach volume D:\
fltMC.exe attach svhdxflt D:\
# NOTE 1: Non-CSV deployments of Shared VHDX are not supported. See the disclaimer section of this document.
# NOTE 2: Manual attached is not saved across reboots. You will have to re-issue the command after each reboot.

# Add the 3 data VHDX files to each of the 3 VMs, with Sharing option
1..3 | % { $p = ”D:\VMS\Data” + $_ + ”.VHDX” ; 2..4 | % { $v = ”VM” + $_; Write-Host $v, $p; Add-VMHardDiskDrive -VMName $v -Path $p -ShareVirtualDisk } }

# Start all the VMs
Get-VM | Start-VM

 

5) Configure JOSE-D (DNS, DC)

 

# Preparation steps: Install WS2012R2, rename computer, enable remote desktop

# Install required roles and features, restarts at the end
Install-WindowsFeature AD-Domain-Services, RSAT-ADDS, RSAT-ADDS-Tools

# Configure Internal NIC with a static IP address for DNS / DC
Get-NetAdapter | Rename-NetAdapter -NewName Internal
Set-NetIPInterface -InterfaceAlias Internal -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias Internal -Confirm:$false
New-NetIPAddress -InterfaceAlias Internal -IPAddress 192.168.100.100 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias Internal -ServerAddresses 192.168.100.100

# create AD forest, reboots at the end
Install-ADDSForest -CreateDNSDelegation:$false -DatabasePath "C:\Windows\NTDS" -DomainMode "Win2012" -DomainName "JOSE.TEST" -DomainNetBIOSName "JOSE" -ForestMode "Win2012" -InstallDNS:$true -LogPath "C:\Windows\NTDS" -SYSVOLPath "C:\Windows\SYSVOL" 

 

6) Configure JOSE-A1 (Cluster A)

 

# Preparation steps: Install WS2012R2, rename computer, enable remote desktop

# Install required roles and features, restarts at the end
Install-WindowsFeature File-Services, FS-FileServer, Failover-Clustering
Install-WindowsFeature RSAT-Clustering -IncludeAllSubFeature

# Configure Internal NIC with a static IP address for DNS / DC
Get-NetAdapter | Rename-NetAdapter -NewName Internal
Set-NetIPInterface -InterfaceAlias Internal -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias Internal -Confirm:$false
New-NetIPAddress -InterfaceAlias Internal -IPAddress 192.168.100.101 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias Internal -ServerAddresses 192.168.100.100

# Create 1 Pool, 7 Spaces and Initialize them
# NOTE 3: Running Storage Spaces in a guest is not supported. See the disclaimer section of this document.
# NOTE 4: This unsupported configuration cannot simulate Tiering or Write-back cache, since there are no SSDs.

$s = Get-StorageSubSystem -FriendlyName *Spaces*
New-StoragePool -FriendlyName Pool1 -StorageSubSystemFriendlyName $s.FriendlyName -PhysicalDisks (Get-PhysicalDisk -CanPool $true)
Set-ResiliencySetting -Name Mirror -NumberofColumnsDefault 1 -StoragePool (Get-StoragePool -FriendlyName Pool1)

New-VirtualDisk -FriendlyName Space1 -StoragePoolFriendlyName Pool1 -ResiliencySettingName Mirror –Size 1GB
2..7 | % { New-VirtualDisk -FriendlyName Space$_ -StoragePoolFriendlyName Pool1 -ResiliencySettingName Mirror –Size 3GB }

1..7 | % {
   $Letter ="PQRSTUV”[($_-1)]
$Number = (Get-VirtualDisk -FriendlyName Space$_ | Get-Disk).Number
Set-Disk -Number $Number -IsReadOnly 0
   Set-Disk -Number $Number -IsOffline 0
Initialize-Disk -Number $Number -PartitionStyle MBR
   New-Partition -DiskNumber $Number -DriveLetter $Letter -UseMaximumSize 
   Initialize-Volume -DriveLetter $Letter -FileSystem NTFS -Confirm:$false
}

# Join domain, restart the machine
Add-Computer -DomainName JOSE.TEST -Credential (Get-Credential) –Restart

 

7) Configure JOSE-A2 (Cluster A)

 

# Preparation steps: Install WS2012R2, rename computer, enable remote desktop

# Install required roles and features, restarts at the end
Install-WindowsFeature File-Services, FS-FileServer, Failover-Clustering
Install-WindowsFeature RSAT-Clustering -IncludeAllSubFeature

# Configure Internal NIC with a static IP address for DNS / DC
Get-NetAdapter | Rename-NetAdapter -NewName Internal
Set-NetIPInterface -InterfaceAlias Internal -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias Internal -Confirm:$false
New-NetIPAddress -InterfaceAlias Internal -IPAddress 192.168.100.102 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias Internal -ServerAddresses 192.168.100.100

# Join domain, restart the machine
Add-Computer -DomainName JOSE.TEST -Credential (Get-Credential) –Restart

 

8) Configure JOSE-A3 (Cluster A)

 

# Preparation steps: Install WS2012R2, rename computer, enable remote desktop

# Install required roles and features, restarts at the end
Install-WindowsFeature File-Services, FS-FileServer, Failover-Clustering
Install-WindowsFeature RSAT-Clustering -IncludeAllSubFeature

# Configure Internal NIC with a static IP address for DNS / DC
Get-NetAdapter | Rename-NetAdapter -NewName Internal
Set-NetIPInterface -InterfaceAlias Internal -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias Internal -Confirm:$false
New-NetIPAddress -InterfaceAlias Internal -IPAddress 192.168.100.103 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias Internal -ServerAddresses 192.168.100.100

# Join domain, restart the machine
Add-Computer -DomainName JOSE.TEST -Credential (Get-Credential) -Restart

 

9) Configure Cluster JOSE-A

 

# Validate cluster
Test-Cluster -Node JOSE-A1, JOSE-A2, JOSE-A3

# Create cluster
New-Cluster –Name JOSE-A -Node JOSE-A1, JOSE-A2, JOSE-A3 -StaticAddress 192.168.100.110

# Rename and configure networks
(Get-ClusterNetwork).Name = “Internal”
(Get-ClusterNetwork).Role = 3
(Get-Cluster).UseClientAccessNetworksForSharedVolumes=1

# Rename Witness Disk
$w = Get-ClusterResource | ? { $_.OwnerGroup -eq "Cluster Group" -and $_.ResourceType -eq "Physical Disk"}
$w.Name = "WitnessDisk"

# Add remaining disks to Cluster Shared Volumes
Get-ClusterResource | ? OwnerGroup -eq "Available Storage" | Add-ClusterSharedVolume

# Create Scale-Out File Server
Add-ClusterScaleOutFileServerRole JOSE-F

# Create SMB shares
1..6 | % {
MD C:\ClusterStorage\Volume$_\Share
New-SmbShare -Name Share$_ -Path C:\ClusterStorage\Volume$_\Share -FullAccess JOSE.Test\Administrator
Set-SmbPathAcl -ShareName Share$_
}

 

10) Configure JOSE-V

 

# Preparation steps: Install WS2012R2, rename computer, enable remote desktop

# Install required roles and features, restarts at the end
Install-WindowsFeature RSAT-Clustering -IncludeAllSubFeature

# Configure Internal NIC with a static IP address for DNS / DC
Get-NetAdapter | Rename-NetAdapter -NewName Internal
Set-NetIPInterface -InterfaceAlias Internal -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias Internal -Confirm:$false
New-NetIPAddress -InterfaceAlias Internal -IPAddress 192.168.100.104 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias Internal -ServerAddresses 192.168.100.100

# Join domain, restart the machine
Add-Computer -DomainName JOSE.TEST -Credential (Get-Credential) -Restart

# Map the SMB shares
1..6 | % { 
    $d ="PQRSTU"[($_-1)] + “:”
New-SmbMapping -LocalPath $d -RemotePath \\JOSE-F\Share$_ -Persistent $true
}

# Create the test files
1..6 | % { 
   $f ="PQRSTU"[($_-1)] + “:\testfile.dat”
fsutil file createnew $f (256MB)
   fsutil file setvaliddata $f (256MB)
}

# Run SQLIO (assumes SQLIO.EXE was copied to C:\SLQIO)
c:\sqlio\sqlio.exe -s9999 -T100 -t1 -o16 -b8 -BN -LS -c2000 -frandom -dPQRSTU testfile.dat

# To remove a node while the workload is running (wait a few minutes for rebalancing)
Remove-ClusterNode -Cluster JOSE-A –Name JOSE-A3

# To add a node while the workload is running (wait a few minutes for rebalancing)
Add-ClusterNode -Cluster JOSE-A –Name JOSE-A3

 

 

11) Running the tests without VMM (from JOSE-V)

 

Start Performance Monitor

Start a performance monitor session

Switch to “Histogram Bar” view to show the performance side-by-side

Add a counter for SMB Server Shares, Data bytes/sec, _total instance for JOSE-A1, JOSE-A2 and JOSE-A3.

clip_image003

 

Query the cluster shared volume ownership on Cluster A, with 2 nodes

Get-ClusterSharedVolume -Cluster JOSE-A | Sort OwnerNode | FT OwnerNode, Name, State -AutoSize

OwnerNode Name           State
--------- ----           -----
JOSE-A1   Cluster Disk 6 Online
JOSE-A1   Cluster Disk 3 Online
JOSE-A1   Cluster Disk 4 Online
JOSE-A2   Cluster Disk 7 Online
JOSE-A2   Cluster Disk 2 Online
JOSE-A2   Cluster Disk 5 Online

 

Run SQLIO to issue 8KB IOs

C:\sqlio\sqlio.exe -s9999 -T100 -t1 -o16 -b8 -BN -LS -c2000 -frandom -dPQRSTU testfile.dat

clip_image004

  

Add a 3rd node and wait for it to take 1/3 of the load

Add-ClusterNode -Cluster JOSE-A -Name JOSE-A3

Wait 2 and a half minutes to transition to the following state.

clip_image005

 

Re- query the cluster shared volume ownership on Cluster A, now with 3 nodes

Get-ClusterSharedVolume -Cluster JOSE-A | Sort OwnerNode | FT OwnerNode, Name, State -AutoSize

OwnerNode Name           State
--------- ----           -----
JOSE-A1   Cluster Disk 3 Online
JOSE-A1   Cluster Disk 6 Online
JOSE-A2   Cluster Disk 5 Online
JOSE-A2   Cluster Disk 7 Online
JOSE-A3   Cluster Disk 2 Online
JOSE-A3   Cluster Disk 4 Online

 

12) Installing VMM on JOSE-V

 

# Install .NET Framework
Install-WindowsFeature NET-Framework-Core

# Install Windows Server 2012 R2 ADK
C:\ADK\adksetup.exe
# Select only the “Deployment Tools” and “Windows PE” options

# Install SQL Server 2012 SP1
# New SQL Server standalone installation, Feature Selection: Database Engine Services

# Install VMM 2012 R2
# Features selected to be added: VMM management server, VMM console
# Database: VirtualManagerDB database will be created on JOSEBDA-V
# Service Account: Local System account

 

13) Configuring VMM on JOSE-V

 

Bring the File Server Cluster under VMM management

Select Fabric and use the option to add Storage Devices

clip_image006

Add a Windows-based file server

clip_image007

Specify the full path to the file server cluster:

clip_image008

 

Verify the File Server Cluster was properly discovered by VMM

Check the provider

clip_image009

Check the Storage Spaces discovery

clip_image010

Check the Scale-Out File Server and file share discovery

clip_image011

 

 

Remove the File Server Cluster node (demo starts with 2 nodes)

Under the properties of the File Server Cluster, remove node 3

clip_image012

Check progress under running Jobs

clip_image013


 

While running a workload, add a File Server Cluster node

Under the properties of the File Server Cluster, add node 3 specifying the full path of to the server

clip_image014

Check progress under running Jobs

clip_image015

 

14) Verifying systems’ configuration

 PS C:\> Get-PhysicalDisk -CimSession JOSE-A1, JOSE-A2, JOSE-A3 | Sort PSComputerName, Size

 

PS C:\> Get-Disk -CimSession JOSE-A1, JOSE-A2, JOSE-A3 | Sort PSComputerName, Size

Number Friendly Name                  OperationalStatus Total Size Partition Style PSComputerName
------ -------------                  ----------------- ---------- --------------- --------------
4      Microsoft Storage Space Device Online                  1 GB MBR             JOSE-A1
10     Microsoft Storage Space Device Online                  3 GB MBR             JOSE-A1
8      Microsoft Storage Space Device Online                  3 GB MBR             JOSE-A1
0      Virtual HD ATA Device          Online                 40 GB MBR             JOSE-A1
4      Microsoft Storage Space Device Online                  3 GB MBR             JOSE-A2
5      Microsoft Storage Space Device Online                  3 GB MBR             JOSE-A2
0      Virtual HD ATA Device          Online                 40 GB MBR             JOSE-A2
4      Microsoft Storage Space Device Online                  3 GB MBR             JOSE-A3
5      Microsoft Storage Space Device Online                  3 GB MBR             JOSE-A3
0      Virtual HD ATA Device          Online                 40 GB MBR             JOSE-A3

PS C:\> Get-ClusterNode -Cluster JOSE-A | FT Cluster, NodeName, State, Id -AutoSize

Cluster NodeName State Id
------- -------- ----- --
JOSE-A  JOSE-A1     Up 2
JOSE-A  JOSE-A2     Up 1
JOSE-A  JOSE-A3     Up 3

PS C:\> Get-ClusterResource -Cluster JOSE-A | FT –AutoSize

Name                             State  OwnerGroup    ResourceType
----                             -----  ----------    ------------
Cluster IP Address               Online Cluster Group IP Address
Cluster Name                     Online Cluster Group Network Name
Cluster Pool 1                   Online Cluster Group Storage Pool
JOSE-F                           Online JOSE-F        Distributed Network Name
Scale-Out File Server (\\JOSE-F) Online JOSE-F        Scale Out File Server
WitnessDisk                      Online Cluster Group Physical Disk

PS C:\> Get-ClusterSharedVolume -Cluster JOSE-A | FT -AutoSize

Name           State  Node
----           -----  ----
Cluster Disk 2 Online JOSE-A3
Cluster Disk 3 Online JOSE-A2
Cluster Disk 4 Online JOSE-A3
Cluster Disk 5 Online JOSE-A1
Cluster Disk 6 Online JOSE-A2
Cluster Disk 7 Online JOSE-A1

PS C:\> Get-SmbShare Share* -CimSession JOSE-F | FT -AutoSize

Name   ScopeName Path                            Description PSComputerName
----   --------- ----                            ----------- --------------
Share1 JOSE-F    C:\ClusterStorage\Volume1\Share             JOSE-F
Share2 JOSE-F    C:\ClusterStorage\Volume2\SHARE             JOSE-F
Share3 JOSE-F    C:\ClusterStorage\Volume3\Share             JOSE-F
Share4 JOSE-F    C:\ClusterStorage\Volume4\Share             JOSE-F
Share5 JOSE-F    C:\ClusterStorage\Volume5\SHARE             JOSE-F
Share6 JOSE-F    C:\ClusterStorage\Volume6\Share             JOSE-F

  

15) Final Notes

 

  • Keep in mind that there are dependencies between the services running on each VM.
  • To shut them down, start with VM5 and end with VM1, waiting for each one to go down completely before moving to the next one.
  • To bring the VMs up again, go from VM1 to VM5, waiting for the previous one to be fully up (with low to no CPU usage) before starting the next one.
  • I hope you enjoyed these step-by-step instructions. I strongly encourage you to try them out and perform the entire installation yourself. It’s a good learning experience.
  • Let me know how these steps worked for you using the comment section. If you run into any issues or found anything particularly interesting, don’t forget to mention the number of the step.

 

To see all of the posts in this series, check out the What’s New in Windows Server & System Center 2012 R2 archive.

SNIA’s Storage Developer Conference 2013 is just a few weeks away

$
0
0

The Storage Networking Industry Association (SNIA) is hosting the 10th Storage Developer Conference (SDC) in the Hyatt Regency in beautiful Santa Clara, CA (Silicon Valley) on the week of September 16th. As usual, Microsoft is one of the underwriters of the SNIA SMB2/SMB3 PlugFest, which is co-located with the SDC event.

For developers working with storage-related technologies, this event gathers a unique crowd and includes a rich agenda that you can find at http://www.storagedeveloper.org. Many of the key industry players are represented and this year’s agenda lists presentations from EMC, Fujitsu, Google, Hortonworks, HP, Go Daddy, Huawei, IBM, Intel, Microsoft, NEC, NetApp, Netflix, Oracle, Red Hat, Samba Team, Samsung, Spectra Logic, SwitfTest, Tata and many others.

It’s always worth reminding you that the SDC presentations are usually delivered to developers by the actual product development teams and frequently the actual developer of the technology is either delivering the presentation or is in the room to take questions. That kind of deep insight is not common in every conference out there.

Presentations by Microsoft this year include:

TitlePresenters
Advancements in Windows File SystemsNeal Christiansen, Principal Development Lead, Microsoft
LRC Erasure Coding in Windows Storage SpacesCheng Huang, Researcher, Microsoft Research
SMB3 UpdateDavid Kruse, Development Lead, Microsoft
Cluster Shared VolumesVladimir Petter, Principal Software Design Engineer, Microsoft
Tunneling SCSI over SMB: Shared VHDX files for Guest Clustering in Windows Server 2012 R2Jose Barreto, Principal Program Manager, Microsoft
Matt Kurjanowicz, Software Development Engineer, Microsoft
Windows Azure Storage - Speed and Scale in the CloudJoe Giardino, Senior Development Lead, Microsoft
SMB Direct updateGreg Kramer, Sr. Software Engineer, Microsoft
Scaled RDMA Performance & Storage Design with Windows Server SMB 3.0Dan Lovinger, Principal Software Design Engineer, Microsoft
SPEC SFS Benchmark - The Next GenerationSpencer Shepler, Architect, Microsoft
Data Deduplication as a Platform for Virtualization and High Scale StorageAdi Oltean, Principal Software Design Engineer, Microsoft
Sudipta Sengupta, Sr. Researcher, Microsoft

For a taste of what SDC presentations look like, make sure to visit the site for last year’s event, where you can find the downloadable PDF files for most and video recordings for some. You can find them at http://www.snia.org/events/storage-developer2012/presentations12.

Registration for SDC 2013 is open at http://www.storagedeveloper.org and you should definitely plan to attend. If you are registered, leave a comment and let’s plan to meet when we get there!

Step-by-step for Storage Spaces Tiering in Windows Server 2012 R2

$
0
0

1. Overview

 

In this document, I am sharing all the steps I used to create a demo or test environment of a storage space with storage tiers on Windows Server 2012 R2 Preview so that you can experiment with some of the new technologies yourself. You need only a single computer with one SSD and one HDD for this demo (the specs are provided below).

If you're not familiar with Storage Spaces or storage tiers (also known as Storage Spaces Tiering), I would strong encourage you to review the Storage Spaces Overview and TechEd 2013 talks on “Storage Spaces”, which you can find below:

- Storage Spaces Overview

· Storage Spaces: What’s New in Windows Server 2012 R2

· Deploying Windows Server 2012 R2 File Services for Exceptional $/IOPS

 

2. Simulated environment with limited hardware

 

In a typical storage tier configuration (non-clustered), you have a JBOD with a few hard disks, some being SSDs and some being HDD. For instance, you could have 4 SSDs and 8 HDDs. You would then combine all 12 disks into a pool, create the two tiers (SSD and HDD) and then create spaces (virtual disks) on top of those. Here’s what it looks like:

image

However, if you’re just experimenting with this feature or trying to learn how to configure it, the investment in the hardware (1 JBOD, 4 SSDs, 8 HDDs, SAS HBA, cables) might be holding you back. So, in this blog post, we’ll show how to configure storage tiers using just one SSD and one HDD, with the help of Hyper-V. Here’s what the simulated environment looks like:

image

As you can see in the diagram, the basic goal here is to simulate a tiered storage space using 4 (simulated) SSDs and 8 (simulated) HDDs. We’ll show the Windows PowerShell cmdlets to create the VHDs and the VM required to simulate the environment. Then, inside the guest, we’ll create the pool, storage tiers and storage spaces (both simple and mirrored). To verify it’s all working, we’ll create some test files and run a simulated SQL workload. We’ll also look into the storage tier maintenance tasks and extending an existing storage space.

 

3. Required Hardware and Software, plus Disclaimers

 

You will need the following hardware to perform the steps described here:

You will need the following software to perform the steps described here:

Notes and disclaimers:

  • A certain familiarity with Windows administration and configuration is assumed. If you're new to Windows, this document is not for you. Sorry...
  • If you are asked a question or required to perform an action that you do not see described in these steps, go with the default option.
  • There are usually several ways to perform a specific configuration or administration task. What I describe here is one of those many ways. It's not necessarily the best way, just the one I personally like best at the moment.
  • For the most part, I use PowerShell to configure the systems. You can also use a graphical interface instead, but I did not describe those steps here.
  • The specific Storage Spaces configuration shown in this blog post is not supported. Microsoft Support will only answer questions and assist in troubleshooting configurations where Storage Spaces uses a physical machine (not a VM) and uses one of the certified JBOD hardware solutions (see http://www.windowsservercatalog.com/results.aspx?bCatID=1573&cpID=0&avc=10&OR=1)
  • Because of the item above, the configuration described here should only be used for demos, testing or learning environments.

 

4. Configure the physical host

 

# Preparation steps: Install Window Server 2012 R2 Preview

# Install required roles and features, restart at the end

Install-WindowsFeature Hyper-V -IncludeManagementTools –Restart

 

# Create 4 VHDX files on the SSD with 10GB each

1..4 | % { New-VHD -Path D:\VMS\VMA_SSD_$_.VHDX -Fixed –Size 10GB}

 

# Create 8 VHDX files on the HDD with 30GB each

1..8 | % { New-VHD -Path E:\VMS\VMA_HDD_$_.VHDX -Fixed –Size 30GB}

 

# Create a new VM. Assumes you have an Windows Server 2012 R2 OS VHDX in place

New-VM -Name VMA -Path D:\VMS –VHDPath D:\VMS\VMA_OS.VHDX -Memory 2GB

 

# Add all data disks to the VM

1..4 | % { Add-VMHardDiskDrive -VMName VMA -ControllerType SCSI -Path D:\VMS\VMA_SSD_$_.VHDX }

1..8 | % { Add-VMHardDiskDrive -VMName VMA -ControllerType SCSI -Path E:\VMS\VMA_HDD_$_.VHDX }

 

# Start the VM

Start-VM VMA

   

5. Check VM configuration (from the Host, with output)

 

 

PS C:\>Get-VM VMA
 

Name State   CPUUsage(%) MemoryAssigned(M) Uptime   Status
---- -----   ----------- ----------------- ------   ------
VMA  Running 0           2048              00:01:52 Operating normally
 

PS C:\>Get-VM VMA | Get-VMHardDiskDrive
 
VMName ControllerType ControllerNumber ControllerLocation DiskNumber Path
------ -------------- ---------------- ------------------ ---------- ----
VMA    IDE            0                0                             D:\VMS\VMA_OS.VHDX
VMA    SCSI           0                0                             D:\VMS\VMA_SSD_1.VHDX
VMA    SCSI           0                1                             D:\VMS\VMA_SSD_2.VHDX
VMA    SCSI           0                2                             D:\VMS\VMA_SSD_3.VHDX
VMA    SCSI           0                3                             D:\VMS\VMA_SSD_4.VHDX
VMA    SCSI           0                4                             E:\VMS\VMA_HDD_1.VHDX
VMA    SCSI           0                5                             E:\VMS\VMA_HDD_2.VHDX
VMA    SCSI           0                6                             E:\VMS\VMA_HDD_3.VHDX
VMA    SCSI           0                7                             E:\VMS\VMA_HDD_4.VHDX
VMA    SCSI           0                8                             E:\VMS\VMA_HDD_5.VHDX
VMA    SCSI           0                9                             E:\VMS\VMA_HDD_6.VHDX
VMA    SCSI           0                10                            E:\VMS\VMA_HDD_7.VHDX
VMA    SCSI           0                11                            E:\VMS\VMA_HDD_8.VHDX
 

 

6. Check VM configuration (from the Guest, with output)

 

PS C:\> Get-PhysicalDisk | Sort Size | FT DeviceId, FriendlyName, CanPool, Size, MediaType -AutoSize
 
DeviceId FriendlyName   CanPool        Size MediaType
-------- ------------   -------        ---- ---------
2        PhysicalDisk2     True 10737418240 UnSpecified
4        PhysicalDisk4     True 10737418240 UnSpecified
3        PhysicalDisk3     True 10737418240 UnSpecified
1        PhysicalDisk1     True 10737418240 UnSpecified
12       PhysicalDisk12    True 32212254720 UnSpecified
11       PhysicalDisk11    True 32212254720 Unspecified
7        PhysicalDisk7     True 32212254720 Unspecified
6        PhysicalDisk6     True 32212254720 Unspecified
5        PhysicalDisk5     True 32212254720 Unspecified
10       PhysicalDisk10    True 32212254720 UnSpecified
9        PhysicalDisk9     True 32212254720 Unspecified
8        PhysicalDisk8     True 32212254720 Unspecified
0        PhysicalDisk0    False 42949672960 Unspecified
 

PS C:\> Get-PhysicalDisk -CanPool $true | ? Size -lt 20GB | Sort Size | FT -AutoSize
 
FriendlyName  CanPool OperationalStatus HealthStatus Usage        Size
------------  ------- ----------------- ------------ -----        ----
PhysicalDisk4 True    OK                Healthy      Auto-Select 10 GB
PhysicalDisk2 True    OK                Healthy      Auto-Select 10 GB
PhysicalDisk1 True    OK                Healthy      Auto-Select 10 GB
PhysicalDisk3 True    OK                Healthy      Auto-Select 10 GB
 

PS C:\> Get-PhysicalDisk -CanPool $true | ? Size -gt 20GB | Sort Size | FT -AutoSize
 

FriendlyName   CanPool OperationalStatus HealthStatus Usage        Size
------------   ------- ----------------- ------------ -----        ----
PhysicalDisk6  True    OK                Healthy      Auto-Select 30 GB
PhysicalDisk11 True    OK                Healthy      Auto-Select 30 GB
PhysicalDisk12 True    OK                Healthy      Auto-Select 30 GB
PhysicalDisk7  True    OK                Healthy      Auto-Select 30 GB
PhysicalDisk5  True    OK                Healthy      Auto-Select 30 GB
PhysicalDisk10 True    OK                Healthy      Auto-Select 30 GB
PhysicalDisk8  True    OK                Healthy      Auto-Select 30 GB
PhysicalDisk9  True    OK                Healthy      Auto-Select 30 GB
 

7. Configure media type for virtual SAS disks

 

# Create Storage Pool

$s = Get-StorageSubSystem

New-StoragePool -StorageSubSystemId $s.UniqueId -FriendlyName Pool1 -PhysicalDisks (Get-PhysicalDisk -CanPool $true)

 

# Configure media type for virtual SAS disks

Get-StoragePool Pool1 | Get-PhysicalDisk | ? Size -lt 20GB | Set-PhysicalDisk –MediaType SSD

Get-StoragePool Pool1 | Get-PhysicalDisk | ? Size -gt 20GB | Set-PhysicalDisk –MediaType HDD

 

 

8. Check media type configuration (with output)

 

PS C:\> Get-StoragePool Pool1

 

FriendlyName            OperationalStatus       HealthStatus            IsPrimordial            IsReadOnly

------------            -----------------       ------------            ------------            ----------

Pool1                   OK                      Healthy                 False                   False

 

 

PS C:\> Get-StoragePool Pool1 | Get-PhysicalDisk | Sort Size | FT –AutoSize

 

FriendlyName   CanPool OperationalStatus HealthStatus Usage           Size

------------   ------- ----------------- ------------ -----           ----

PhysicalDisk3  False   OK                Healthy      Auto-Select  9.25 GB

PhysicalDisk2  False   OK                Healthy      Auto-Select  9.25 GB

PhysicalDisk1  False   OK                Healthy      Auto-Select  9.25 GB

PhysicalDisk4  False   OK                Healthy      Auto-Select  9.25 GB

PhysicalDisk11 False   OK                Healthy      Auto-Select 29.25 GB

PhysicalDisk12 False   OK                Healthy      Auto-Select 29.25 GB

PhysicalDisk7  False   OK                Healthy      Auto-Select 29.25 GB

PhysicalDisk6  False   OK                Healthy      Auto-Select 29.25 GB

PhysicalDisk9  False   OK                Healthy      Auto-Select 29.25 GB

PhysicalDisk5  False   OK                Healthy      Auto-Select 29.25 GB

PhysicalDisk8  False   OK                Healthy      Auto-Select 29.25 GB

PhysicalDisk10 False   OK                Healthy      Auto-Select 29.25 GB

 

 

PS C:\> Get-StoragePool Pool1 | Get-PhysicalDisk | Sort Size | FT FriendlyName, Size, MediaType, HealthStatus, OperationalStatus -AutoSize

 

FriendlyName          Size MediaType HealthStatus OperationalStatus

------------          ---- --------- ------------ -----------------

PhysicalDisk3   9932111872 SSD       Healthy      OK

PhysicalDisk2   9932111872 SSD       Healthy      OK

PhysicalDisk1   9932111872 SSD       Healthy      OK

PhysicalDisk4   9932111872 SSD       Healthy      OK

PhysicalDisk11 31406948352 HDD       Healthy      OK

PhysicalDisk12 31406948352 HDD       Healthy      OK

PhysicalDisk7  31406948352 HDD       Healthy      OK

PhysicalDisk6  31406948352 HDD       Healthy      OK

PhysicalDisk9  31406948352 HDD       Healthy      OK

PhysicalDisk5  31406948352 HDD       Healthy      OK

PhysicalDisk8  31406948352 HDD       Healthy      OK

PhysicalDisk10 31406948352 HDD       Healthy      OK

 

PS C:\> Get-StoragePool Pool1 | Get-PhysicalDisk | Group MediaType, Size | Sort Name | FT -AutoSize

 

Count Name             Group

----- ----             -----

    8 HDD, 31406948352 {MSFT_PhysicalDisk (ObjectId = "{1}\\WIN-T5PORQGQECN\root/Microsoft/Win...), MSFT_PhysicalDis...

    4 SSD, 9932111872  {MSFT_PhysicalDisk (ObjectId = "{1}\\WIN-T5PORQGQECN\root/Microsoft/Win...), MSFT_PhysicalDis...

 

PS C:\> Get-StoragePool Pool1 | FL Size, AllocatedSize

 

Size          : 290984034304
AllocatedSize : 3221225472 

 

9. Configure Tiers

 

# Configure two tiers

Get-StoragePool Pool1 | New-StorageTier –FriendlyName SSDTier –MediaType SSD

Get-StoragePool Pool1 | New-StorageTier –FriendlyName HDDTier –MediaType HDD

 

 

10. Check Tiers configuration (with output)

 

PS C:\> Get-StorageTier | FT FriendlyName, MediaType, Size -AutoSize

 

FriendlyName   MediaType        Size

------------   ---------        ----

SSDTier        SSD                 0

HDDTier        HDD                 0

 

PS C:\> Get-StoragePool Pool1 | FL Size, AllocatedSize

 

Size          : 290984034304

AllocatedSize : 3221225472

 

Note: 3GB used out of 271GB total. Storage Spaces reserves 256MB on each disk in the pool for internal VD metadata. 256MB * 12 drives = 3GB.

 

 

PS C:\> Get-StorageTierSupportedSize SSDTier -ResiliencySettingName Simple | FT -AutoSize

 

 

SupportedSizes TierSizeMin TierSizeMax TierSizeDivisor

-------------- ----------- ----------- ---------------

{}              4294967296 34359738368      4294967296

 

Note: 32GB on the SSD Tier (8 GB * 4). Minimum size is 4GB.

 

PS C:\> Get-StorageTierSupportedSize SSDTier -ResiliencySettingName Mirror | FT -AutoSize

 

SupportedSizes TierSizeMin TierSizeMax TierSizeDivisor

-------------- ----------- ----------- ---------------

{}              2147483648 17179869184      2147483648

 

Note: Mirrored offers 16GB on the SSD Tier (8 GB * 4 / 2). Minimum size is 2GB.

 

 

PS C:\> Get-StorageTierSupportedSize HDDTier -ResiliencySettingName Simple | FT -AutoSize

 

SupportedSizes TierSizeMin  TierSizeMax TierSizeDivisor

-------------- -----------  ----------- ---------------

{}              8589934592 249108103168      8589934592

 

Note: 232GB on the HDD Tier (29 GB * 8). Minimum size is 8GB.

 

PS C:\> Get-StorageTierSupportedSize HDDTier -ResiliencySettingName Mirror | FT -AutoSize

 

SupportedSizes TierSizeMin  TierSizeMax TierSizeDivisor

-------------- -----------  ----------- ---------------

{}              4294967296 124554051584      4294967296

 

Note: Mirrored offers 116GB on the HDD Tier (29 GB * 8 / 2). Minimum size is 4GB.

 

11. Resiliency Settings and Spaces

 

 

# Configure resiliency settings

Get-StoragePool Pool1 | Set-ResiliencySetting -Name Simple -NumberOfColumnsDefault 4

Get-StoragePool Pool1 | Set-ResiliencySetting -Name Mirror -NumberOfColumnsDefault 2

 

# Create simple and mirrored spaces with tiering

$SSD = Get-StorageTier -FriendlyName SSDTier

$HDD = Get-StorageTier -FriendlyName HDDTier

Get-StoragePool Pool1 | New-VirtualDisk -FriendlyName Space1 -ResiliencySettingName Simple –StorageTiers $SSD, $HDD -StorageTierSizes 8GB, 32GB -WriteCacheSize 1GB

Get-StoragePool Pool1 | New-VirtualDisk -FriendlyName Space2 -ResiliencySettingName Mirror -StorageTiers $SSD, $HDD -StorageTierSizes 8GB, 32GB –WriteCacheSize 1GB

 

12. Check Storage Spaces configuration (with output)

 

PS C:\> Get-StoragePool Pool1 | Get-ResiliencySetting | FT -AutoSize

 

Name   NumberOfDataCopies PhysicalDiskRedundancy NumberOfColumns Interleave

----   ------------------ ---------------------- --------------- ----------

Simple 1                  0                      4               262144

Mirror 2                  1                      2               262144

Parity 1                  1                      Auto            262144

 

PS C:\> Get-VirtualDisk | FT -AutoSize

 

FriendlyName ResiliencySettingName OperationalStatus HealthStatus IsManualAttach  Size

------------ --------------------- ----------------- ------------ --------------  ----

Space1       Simple                OK                Healthy      False          40 GB

Space2       Mirror                OK                Healthy      False          40 GB

 

PS C:\> Get-StorageTier | FT FriendlyName, MediaType, Size -AutoSize

 

FriendlyName   MediaType        Size

------------   ---------        ----

SSDTier        SSD                 0

HDDTier        HDD                 0

Space1_SSDTier SSD        8589934592

Space1_HDDTier HDD       34359738368

Space2_SSDTier SSD        8589934592

Space2_HDDTier HDD       34359738368

 

PS C:\> Get-StoragePool Pool1 | FL Size, AllocatedSize

 

Size          : 290984034304

AllocatedSize : 136365211648

 

Note: 127GB allocated after creating the two spaces. 144GB left out of 271GB total.

 

PS C:\> Get-StorageTierSupportedSize SSDTier -ResiliencySettingName Simple | FT -AutoSize

 

SupportedSizes TierSizeMin TierSizeMax TierSizeDivisor

-------------- ----------- ----------- ---------------

{}              4294967296  4294967296      4294967296

 

Note: 4GB left on the SSD Tier.

 

PS C:\> Get-StorageTierSupportedSize SSDTier -ResiliencySettingName Mirror | FT -AutoSize

 

SupportedSizes TierSizeMin TierSizeMax TierSizeDivisor

-------------- ----------- ----------- ---------------

{}              2147483648  2147483648      2147483648

 

Note: 4GB left on the SSD Tier. 2GB available if mirroring.

 

PS C:\> Get-StorageTierSupportedSize HDDTier -ResiliencySettingName Simple | FT -AutoSize

 

SupportedSizes TierSizeMin  TierSizeMax TierSizeDivisor

-------------- -----------  ----------- ---------------

{}              4294967296 146028888064      4294967296

 

Note: 136GB left on the SSD Tier.

 

PS C:\> Get-StorageTierSupportedSize HDDTier -ResiliencySettingName Mirror | FT -AutoSize

 

SupportedSizes TierSizeMin TierSizeMax TierSizeDivisor

-------------- ----------- ----------- ---------------

{}              2147483648 73014444032      2147483648

 

Note: 136GB left on the SSD Tier. 68GB available if mirroring.

 

13. Create Partitions and Volumes

 

 

# Configure volume “F” on Space1

Get-VirtualDisk Space1 | Get-Disk | Set-Disk -IsReadOnly 0

Get-VirtualDisk Space1 | Get-Disk | Set-Disk -IsOffline 0

Get-VirtualDisk Space1 | Get-Disk | Initialize-Disk -PartitionStyle GPT

Get-VirtualDisk Space1 | Get-Disk | New-Partition -DriveLetter “F” -UseMaximumSize

Initialize-Volume -DriveLetter “F” -FileSystem NTFS -Confirm:$false

 

# Configure volume “G” on Space2

Get-VirtualDisk Space2 | Get-Disk | Set-Disk -IsReadOnly 0

Get-VirtualDisk Space2 | Get-Disk | Set-Disk -IsOffline 0

Get-VirtualDisk Space2 | Get-Disk | Initialize-Disk -PartitionStyle GPT

Get-VirtualDisk Space2 | Get-Disk | New-Partition -DriveLetter “G” -UseMaximumSize

Initialize-Volume -DriveLetter “G” -FileSystem NTFS -Confirm:$false

 

14. Check Partitions and Volumes (with output)

 

PS C:\> Get-Partition | ? DriveLetter -ge "F" | FT -AutoSize

 

   Disk Number: 13

 

PartitionNumber DriveLetter Offset        Size Type

--------------- ----------- ------        ---- ----

2               F           135266304 39.87 GB Basic

 

   Disk Number: 14

 

PartitionNumber DriveLetter Offset        Size Type

--------------- ----------- ------        ---- ----

2               G           135266304 39.87 GB Basic

 

PS C:\> Get-Volume | ? DriveLetter -ge "F" | FT -AutoSize

 

DriveLetter FileSystemLabel FileSystem DriveType HealthStatus SizeRemaining     Size

----------- --------------- ---------- --------- ------------ -------------     ----

F                           NTFS       Fixed     Healthy           39.75 GB 39.87 GB

G                           NTFS       Fixed     Healthy           39.75 GB 39.87 GB

 

15. Create Test Files

 

 

# Create 3 files on volume “F”, place them on different tiers

1..3 | % {

   fsutil file createnew f:\file$_.dat (4GB)

   fsutil file setvaliddata f:\file$_.dat (4GB)

}

Set-FileStorageTier -FilePath f:\file1.dat -DesiredStorageTierFriendlyName Space1_SSDTier

Set-FileStorageTier -FilePath f:\file2.dat -DesiredStorageTierFriendlyName Space1_HDDTier

Get-FileStorageTier -VolumeDriveLetter F

 

# Create 3 files on volume “G”, place them on different tiers

1..3 | % {

   fsutil file createnew g:\file$_.dat (4GB)

   fsutil file setvaliddata g:\file$_.dat (4GB)

}

Set-FileStorageTier -FilePath g:\file1.dat -DesiredStorageTierFriendlyName Space2_SSDTier

Set-FileStorageTier -FilePath g:\file2.dat -DesiredStorageTierFriendlyName Space2_HDDTier

Get-FileStorageTier -VolumeDriveLetter G

 

16. Check Test Files (with output)

 

PS C:\> Dir F:

 

    Directory: f:\

 

Mode                LastWriteTime     Length Name

----                -------------     ------ ----

-a---         8/21/2013  10:09 AM 4294967296 file1.dat

-a---         8/21/2013  10:09 AM 4294967296 file2.dat

-a---         8/21/2013  10:09 AM 4294967296 file3.dat

 

PS C:\> Dir G:

 

    Directory: g:\

 

Mode                LastWriteTime     Length Name

----                -------------     ------ ----

-a---         8/21/2013  10:08 AM 4294967296 file1.dat

-a---         8/21/2013  10:08 AM 4294967296 file2.dat

-a---         8/21/2013  10:08 AM 4294967296 file3.dat

 

PS C:\sqlio> Get-Volume | ? DriveLetter -ge "F" | FT -AutoSize

 

DriveLetter FileSystemLabel FileSystem DriveType HealthStatus SizeRemaining     Size

----------- --------------- ---------- --------- ------------ -------------     ----

F                           NTFS       Fixed     Healthy           27.75 GB 39.87 GB

G                           NTFS       Fixed     Healthy           27.75 GB 39.87 GB

 

PS C:\> Get-FileStorageTier -VolumeDriveLetter F | FT -AutoSize

 

FilePath     DesiredStorageTierName PlacementStatus    State

--------     ---------------------- ---------------    -----

F:\file1.dat Space1_SSDTier         Completely on tier OK

F:\file2.dat Space1_HDDTier         Partially on tier  Pending

 

PS C:\> Get-FileStorageTier -VolumeDriveLetter G | FT -AutoSize

 

FilePath     DesiredStorageTierName PlacementStatus    State

--------     ---------------------- ---------------    -----

G:\file1.dat Space2_SSDTier         Completely on tier OK

G:\file2.dat Space2_HDDTier         Partially on tier  Pending

 

 

17. Tasks for Storage Tiering

 

# Check tasks used by Storage Tiering

Get-ScheduledTask -TaskName *Tier* | FT –AutoSize

Get-ScheduledTask -TaskName *Tier* | Get-ScheduledTaskInfo

 

# Manually running the “Storage Tiers Optimization” task

Get-ScheduledTask -TaskName "Storage Tiers Optimization" | Start-ScheduledTask

 

18. Tasks for Storage Tiering (with Output)

 

PS C:\> Get-ScheduledTask -TaskName *Tier*  | FT –AutoSize

 

TaskPath                                     TaskName                                State

--------                                     --------                                -----

\Microsoft\Windows\Storage Tiers Management\ Storage Tiers Management Initialization Ready

\Microsoft\Windows\Storage Tiers Management\ Storage Tiers Optimization              Ready

 

PS C:\> Get-ScheduledTask -TaskName *Tier*  | Get-ScheduledTaskInfo

 

LastRunTime        :

LastTaskResult     : 1

NextRunTime        : 8/22/2013 1:00:00 AM

NumberOfMissedRuns : 0

TaskName           : Storage Tiers Optimization

TaskPath           : \Microsoft\Windows\Storage Tiers Management\

PSComputerName     :

 

LastRunTime        : 8/21/2013 11:18:18 AM

LastTaskResult     : 0

NextRunTime        :

NumberOfMissedRuns : 0

TaskName           : Storage Tiers Management Initialization

TaskPath           : \Microsoft\Windows\Storage Tiers Management\

PSComputerName     :

 

PS C:\> Get-ScheduledTask -TaskName "Storage Tiers Optimization" | Start-ScheduledTask

PS C:\> Get-ScheduledTask -TaskName "Storage Tiers Optimization" | Get-ScheduledTaskInfo

 

LastRunTime        : 8/21/2013 12:11:11 PM

LastTaskResult     : 267009

NextRunTime        : 8/22/2013 1:00:00 AM

NumberOfMissedRuns : 0

TaskName           : Storage Tiers Optimization

TaskPath           : \Microsoft\Windows\Storage Tiers Management\

PSComputerName     :

 

 

19. Running SQLIO

 

# These commands assume that the SQLIO2.EXE file was copied to the C:\SQLIO folder

# SQLIO workload 1 : 30 seconds, random, read, 8KB, 4 thread, 16 outstanding IOs, no buffering

# SQLIO workload 2 : 30 seconds, sequential, read, 512KB, 4 thread, 4 outstanding IOs, no buffering

 

# Check file location on tiers for volume F:

Get-FileStorageTier -VolumeDriveLetter F | FT -AutoSize

 

# Running SQLIO on F:, using File1 (HDD tier), File2 (HDD tier) and File 3 (unspecified tier)

c:\sqlio\sqlio2.exe -s30 -frandom -kR -b8 -t4 -o16 -BN f:\file1.dat

c:\sqlio\sqlio2.exe -s30 -frandom -kR -b8 -t4 -o16 -BN f:\file2.dat

c:\sqlio\sqlio2.exe -s30 -frandom -kR -b8 -t4 -o16 -BN f:\file3.dat

c:\sqlio\sqlio2.exe -s30 -fsequential -kR -b512 -t4 -o4 -BN f:\file1.dat

c:\sqlio\sqlio2.exe -s30 -fsequential -kR -b512 -t4 -o4 -BN f:\file2.dat

c:\sqlio\sqlio2.exe -s30 -fsequential -kR -b512 -t4 -o4 -BN f:\file3.dat

 

# Check file location on tiers for volume G:

Get-FileStorageTier -VolumeDriveLetter G | FT -AutoSize

 

# Running SQLIO on G:, using File1 (HDD tier), File2 (HDD tier) and File 3 (unspecified tier)

c:\sqlio\sqlio2.exe -s30 -frandom -kR -b8 -t4 -o16 -BN g:\file1.dat

c:\sqlio\sqlio2.exe -s30 -frandom -kR -b8 -t4 -o16 -BN g:\file2.dat

c:\sqlio\sqlio2.exe -s30 -frandom -kR -b8 -t4 -o16 -BN g:\file3.dat

c:\sqlio\sqlio2.exe -s30 -fsequential -kR -b512 -t4 -o4 -BN g:\file1.dat

c:\sqlio\sqlio2.exe -s30 -fsequential -kR -b512 -t4 -o4 -BN g:\file2.dat

c:\sqlio\sqlio2.exe -s30 -fsequential -kR -b512 -t4 -o4 -BN g:\file3.dat

 

20. Running SQLIO (with Output)

 

PS C:\> Get-FileStorageTier -VolumeDriveLetter F | FT -AutoSize

 

FilePath     DesiredStorageTierName PlacementStatus    State

--------     ---------------------- ---------------    -----

F:\file1.dat Space1_SSDTier         Completely on tier OK

F:\file2.dat Space1_HDDTier         Completely on tier OK

 

PS C:\> c:\sqlio\sqlio2.exe -s30 -frandom -kR -b8 -t4 -o16 -BN f:\file1.dat

 

sqlio v2.15. 64bit_SG

4 threads reading for 30 secs to file f:\file1.dat

        using 8KB random IOs

        enabling multiple I/Os per thread with 16 outstanding

        buffering set to not use file nor disk caches (as is SQL Server)

using current size: 4096 MB for file: f:\file1.dat

initialization done

CUMULATIVE DATA:

throughput metrics:

IOs/sec: 35745.63

MBs/sec:   279.26

 

PS C:\> c:\sqlio\sqlio2.exe -s30 -frandom -kR -b8 -t4 -o16 -BN f:\file2.dat

 

sqlio v2.15. 64bit_SG

4 threads reading for 30 secs to file f:\file2.dat

        using 8KB random IOs

        enabling multiple I/Os per thread with 16 outstanding

        buffering set to not use file nor disk caches (as is SQL Server)

using current size: 4096 MB for file: f:\file2.dat

initialization done

CUMULATIVE DATA:

throughput metrics:

IOs/sec:   141.57

MBs/sec:     1.10

 

PS C:\> c:\sqlio\sqlio2.exe -s30 -frandom -kR -b8 -t4 -o16 -BN f:\file3.dat

 

sqlio v2.15. 64bit_SG

4 threads reading for 30 secs to file f:\file3.dat

        using 8KB random IOs

        enabling multiple I/Os per thread with 16 outstanding

        buffering set to not use file nor disk caches (as is SQL Server)

using current size: 4096 MB for file: f:\file3.dat

initialization done

CUMULATIVE DATA:

throughput metrics:

IOs/sec:   384.86

MBs/sec:     3.00

 

PS C:\> c:\sqlio\sqlio2.exe -s30 -fsequential -kR -b512 -t4 -o4 -BN f:\file1.dat

 

sqlio v2.15. 64bit_SG

4 threads reading for 30 secs to file f:\file1.dat

        using 512KB sequential IOs

        enabling multiple I/Os per thread with 4 outstanding

        buffering set to not use file nor disk caches (as is SQL Server)

using current size: 4096 MB for file: f:\file1.dat

initialization done

CUMULATIVE DATA:

throughput metrics:

IOs/sec:   998.25

MBs/sec:   499.12

 

PS C:\> c:\sqlio\sqlio2.exe -s30 -fsequential -kR -b512 -t4 -o4 -BN f:\file2.dat

 

sqlio v2.15. 64bit_SG

4 threads reading for 30 secs to file f:\file2.dat

        using 512KB sequential IOs

        enabling multiple I/Os per thread with 4 outstanding

        buffering set to not use file nor disk caches (as is SQL Server)

using current size: 4096 MB for file: f:\file2.dat

initialization done

CUMULATIVE DATA:

throughput metrics:

IOs/sec:    81.30

MBs/sec:    40.65

 

PS C:\> c:\sqlio\sqlio2.exe -s30 -fsequential -kR -b512 -t4 -o4 -BN f:\file3.dat

 

sqlio v2.15. 64bit_SG

4 threads reading for 30 secs to file f:\file3.dat

        using 512KB sequential IOs

        enabling multiple I/Os per thread with 4 outstanding

        buffering set to not use file nor disk caches (as is SQL Server)

using current size: 4096 MB for file: f:\file3.dat

initialization done

CUMULATIVE DATA:

throughput metrics:

IOs/sec:   148.51

MBs/sec:    74.25

 

PS C:\> Get-FileStorageTier -VolumeDriveLetter G | FT -AutoSize

 

FilePath     DesiredStorageTierName PlacementStatus    State

--------     ---------------------- ---------------    -----

G:\file1.dat Space2_SSDTier         Completely on tier OK

G:\file2.dat Space2_HDDTier         Completely on tier OK

 

PS C:\> c:\sqlio\sqlio2.exe -s30 -frandom -kR -b8 -t4 -o16 -BN g:\file1.dat

 

sqlio v2.15. 64bit_SG

4 threads reading for 30 secs to file g:\file1.dat

        using 8KB random IOs

        enabling multiple I/Os per thread with 16 outstanding

        buffering set to not use file nor disk caches (as is SQL Server)

using current size: 4096 MB for file: g:\file1.dat

initialization done

CUMULATIVE DATA:

throughput metrics:

IOs/sec: 35065.17

MBs/sec:   273.94

 

PS C:\> c:\sqlio\sqlio2.exe -s30 -frandom -kR -b8 -t4 -o16 -BN g:\file2.dat

 

sqlio v2.15. 64bit_SG

4 threads reading for 30 secs to file g:\file2.dat

        using 8KB random IOs

        enabling multiple I/Os per thread with 16 outstanding

        buffering set to not use file nor disk caches (as is SQL Server)

using current size: 4096 MB for file: g:\file2.dat

initialization done

CUMULATIVE DATA:

throughput metrics:

IOs/sec:   138.98

MBs/sec:     1.08

 

PS C:\> c:\sqlio\sqlio2.exe -s30 -frandom -kR -b8 -t4 -o16 -BN g:\file3.dat

 

sqlio v2.15. 64bit_SG

4 threads reading for 30 secs to file g:\file3.dat

        using 8KB random IOs

        enabling multiple I/Os per thread with 16 outstanding

        buffering set to not use file nor disk caches (as is SQL Server)

using current size: 4096 MB for file: g:\file3.dat

initialization done

CUMULATIVE DATA:

throughput metrics:

IOs/sec:   360.33

MBs/sec:     2.81

 

PS C:\> c:\sqlio\sqlio2.exe -s30 -fsequential -kR -b512 -t4 -o4 -BN g:\file1.dat

 

sqlio v2.15. 64bit_SG

4 threads reading for 30 secs to file g:\file1.dat

        using 512KB sequential IOs

        enabling multiple I/Os per thread with 4 outstanding

        buffering set to not use file nor disk caches (as is SQL Server)

using current size: 4096 MB for file: g:\file1.dat

initialization done

CUMULATIVE DATA:

throughput metrics:

IOs/sec:   997.90

MBs/sec:   498.95

 

PS C:\> c:\sqlio\sqlio2.exe -s30 -fsequential -kR -b512 -t4 -o4 -BN g:\file2.dat

 

sqlio v2.15. 64bit_SG

4 threads reading for 30 secs to file g:\file2.dat

        using 512KB sequential IOs

        enabling multiple I/Os per thread with 4 outstanding

        buffering set to not use file nor disk caches (as is SQL Server)

using current size: 4096 MB for file: g:\file2.dat

initialization done

CUMULATIVE DATA:

throughput metrics:

IOs/sec:   118.34

MBs/sec:    59.17

 

PS C:\> c:\sqlio\sqlio2.exe -s30 -fsequential -kR -b512 -t4 -o4 -BN g:\file3.dat

 

sqlio v2.15. 64bit_SG

4 threads reading for 30 secs to file g:\file3.dat

        using 512KB sequential IOs

        enabling multiple I/Os per thread with 4 outstanding

        buffering set to not use file nor disk caches (as is SQL Server)

using current size: 4096 MB for file: g:\file3.dat

initialization done

CUMULATIVE DATA:

throughput metrics:

IOs/sec:   197.65

MBs/sec:    98.82

 

21. Summary of SQLIO Results

 

Here’s the summary of the SQLIO runs:

Resiliency

Workload

Tier

IOPs

MB/s

Simple

8KB
Read

File1 (SSD)

35,746

279

File2 (HDD)

142

1

File3 (Mixed)

385

3

512KB
Sequential

File1 (SSD)

998

499

File2 (HDD)

81

41

File3 (Mixed)

149

74

Mirror

8KB
Read

File1 (SSD)

35,065

274

File2 (HDD)

39

1

File3 (Mixed)

360

3

512KB
Sequential

File1 (SSD)

998

499

File2 (HDD)

118

59

File3 (Mixed)

198

99

 

Note 1: In general, this is working as expected. File1 (SSD Tier) shows SSD performance characteristics while File 2 (HDD Tier) behaves like a spinning disk. File3 is somewhere between the two.

Note 2: There is only one physical SSD and one physical HDD behind all the virtual layers. Both simple and mirrored are spaces are basically limited by the hardware limitations of the two disks.

Note 3: This shows the results of a single run of each workload, so some variance is expected. If you are running this in your own setup, you might want to run each workload multiple times and average it out. There’s some guidance to that regard at http://blogs.technet.com/b/josebda/archive/2013/03/25/sqlio-powershell-and-storage-performance-measuring-iops-throughput-and-latency-for-both-local-disks-and-smb-file-shares.aspx

 

 

22. Extending existing Spaces and Volumes

 

# Check state before change

Get-VirtualDisk Space1 | FT -AutoSize

Get-StorageTier Space1* | FT FriendlyName, Size –AutoSize

Get-StorageTierSupportedSize SSDTier -ResiliencySettingName Simple | FT -AutoSize

Get-VirtualDisk Space1 | Get-Disk | FT -AutoSize

Get-VirtualDisk Space1 | Get-Disk | Get-Partition | FT –AutoSize

Get-VirtualDisk Space1 | Get-Disk | Get-Partition | Get-Volume | FT –AutoSize

 

# Add 4GB on the SSD Tier

Resize-StorageTier Space1_SSDTier -Size 12GB

Get-VirtualDisk Space1 | Get-Disk | Update-Disk

 

# Check after Virtual Disk change

Get-VirtualDisk Space1 | FT -AutoSize

Get-StorageTier Space1* | FT FriendlyName, Size –AutoSize

Get-StorageTierSupportedSize SSDTier -ResiliencySettingName Simple | FT -AutoSize

Get-VirtualDisk Space1 | Get-Disk | FT -AutoSize

Get-VirtualDisk Space1 | Get-Disk | Get-Partition | FT –AutoSize

Get-VirtualDisk Space1 | Get-Disk | Get-Partition | Get-Volume | FT –AutoSize

 

# Extend partition (also extends the volume)

Resize-Partition -DriveLetter F -Size 43.87GB

 

# Check after Partition/Volume change

Get-VirtualDisk Space1 | Get-Disk | Get-Partition | FT –AutoSize

Get-VirtualDisk Space1 | Get-Disk | Get-Partition | Get-Volume | FT –AutoSize

 

23. Extending existing Spaces and Volumes(with output)

 

PS C:\> Get-VirtualDisk Space1 | FT -AutoSize

 

FriendlyName ResiliencySettingName OperationalStatus HealthStatus IsManualAttach  Size

------------ --------------------- ----------------- ------------ --------------  ----

Space1       Simple                OK                Healthy      False          40 GB

 

PS C:\> Get-StorageTier Space1* | FT FriendlyName, Size –AutoSize

 

FriendlyName          Size

------------          ----

Space1_SSDTier  8589934592

Space1_HDDTier 34359738368

 

PS C:\> Get-StorageTierSupportedSize SSDTier -ResiliencySettingName Simple | FT -AutoSize

 

SupportedSizes TierSizeMin TierSizeMax TierSizeDivisor

-------------- ----------- ----------- ---------------

{}              4294967296  4294967296      4294967296

 

PS C:\> Get-VirtualDisk Space1 | Get-Disk | FT -AutoSize

 

Number Friendly Name                  OperationalStatus Total Size Partition Style

------ -------------                  ----------------- ---------- ---------------

13     Microsoft Storage Space Device Online                 40 GB GPT

 

PS C:\> Get-VirtualDisk Space1 | Get-Disk | Get-Partition | FT -AutoSize

 

   Disk Number: 13

 

PartitionNumber DriveLetter Offset        Size Type

--------------- ----------- ------        ---- ----

1                           24576       128 MB Reserved

2               F           135266304 39.87 GB Basic

 

PS C:\> Get-VirtualDisk Space1 | Get-Disk | Get-Partition | Get-Volume | FT -AutoSize

 

DriveLetter FileSystemLabel FileSystem DriveType HealthStatus SizeRemaining     Size

----------- --------------- ---------- --------- ------------ -------------     ----

F                           NTFS       Fixed     Healthy           27.75 GB 39.87 GB

 

PS C:\> Resize-StorageTier Space1_SSDTier -Size 12GB

 

PS C:\> Get-VirtualDisk Space1 | Get-Disk | Update-Disk

 

PS C:\> Get-VirtualDisk Space1 | FT -AutoSize

 

FriendlyName ResiliencySettingName OperationalStatus HealthStatus IsManualAttach  Size

------------ --------------------- ----------------- ------------ --------------  ----

Space1       Simple                OK                Healthy      False          44 GB

 

PS C:\> Get-StorageTier Space1* | FT FriendlyName, Size –AutoSize

 

FriendlyName          Size

------------          ----

Space1_SSDTier 12884901888

Space1_HDDTier 34359738368

 

PS C:\> Get-StorageTierSupportedSize SSDTier -ResiliencySettingName Simple | FT -AutoSize

 

SupportedSizes TierSizeMin TierSizeMax TierSizeDivisor

-------------- ----------- ----------- ---------------

{}                       0           0               0

 

PS C:\> Get-VirtualDisk Space1 | Get-Disk | FT -AutoSize

 

Number Friendly Name                  OperationalStatus Total Size Partition Style

------ -------------                  ----------------- ---------- ---------------

13     Microsoft Storage Space Device Online                 44 GB GPT

 

PS C:\> Get-VirtualDisk Space1 | Get-Disk | Get-Partition | FT –AutoSize

 

   Disk Number: 13

 

PartitionNumber DriveLetter Offset        Size Type

--------------- ----------- ------        ---- ----

1                           24576       128 MB Reserved

2               F           135266304 39.87 GB Basic

 

PS C:\> Get-VirtualDisk Space1 | Get-Disk | Get-Partition | Get-Volume | FT –AutoSize

 

DriveLetter FileSystemLabel FileSystem DriveType HealthStatus SizeRemaining     Size

----------- --------------- ---------- --------- ------------ -------------     ----

F                           NTFS       Fixed     Healthy           27.75 GB 39.87 GB

 

PS C:\> Resize-Partition -DriveLetter F -Size 43.87GB

 

PS C:\> Get-VirtualDisk Space1 | Get-Disk | Get-Partition | FT –AutoSize

 

   Disk Number: 13

 

PartitionNumber DriveLetter Offset        Size Type

--------------- ----------- ------        ---- ----

1                           24576       128 MB Reserved

2               F           135266304 43.87 GB Basic

 

PS C:\> Get-VirtualDisk Space1 | Get-Disk | Get-Partition | Get-Volume | FT –AutoSize

 

DriveLetter FileSystemLabel FileSystem DriveType HealthStatus SizeRemaining     Size

----------- --------------- ---------- --------- ------------ -------------     ----

F                           NTFS       Fixed     Healthy           31.75 GB 43.87 GB

 

 

24. Final Notes

 

  • I hope you enjoyed these step-by-step instructions. I strongly encourage you to try them out and perform the entire installation yourself. It’s a good learning experience.
  • Let me know how these steps worked for you using the comment section. If you run into any issues or found anything particularly interesting, don’t forget to mention the number of the step.

What’s new in SMB PowerShell in Windows Server 2012 R2

$
0
0

Windows Server 2012 R2 introduced a new version of SMB. Technically it’s SMB 3.02, but we continue to call it just SMB3. The main changes are described at http://technet.microsoft.com/en-us/library/hh831474.aspx. With this new release, we made a few changes in SMB PowerShell to support the new scenarios and features. This includes a few new cmdlets and some changes to existing cmdlets, with extra care not break any of your existing scripts. This blog post outlines 7 set of changes related to SMB PowerShell in Windows Server 2012 R2.

 

1) Simpler setting of ACL for the folder behind a share

 

In Windows Server 2012, the SMB share has a property that facilitates applying the share ACL to the file system folder used by the share. Here’s the syntax:

  • (Get-SmbShare –Name <ShareName> ).PresetPathACL | Set-Acl

 

In Windows Server 2012 R2, we have improved this scenario by providing a proper cmdlet to apply the share ACL to the file system used by the share. Here’s the new syntax:

  • Set-SmbPathAcl –Name ShareName

 

Note1 : The Windows Server 2012 syntax continues to work with Windows Server 2012 R2, but the new syntax is much simpler and therefore recommended.

 

Note 2: There is known issue with Windows Server 2012 R2 Preview that causes this new cmdlet to fail when using non-Unicode languages. As a workaround, you can use the old syntax.

 

2) Scale-Out Rebalancing (per-share redirection)

 

Per-share redirection is the new default behavior for the new SMB Clients (Windows 8.1 or Windows Server 2012 R2), when connecting to a Scale-Out clusters that use a storage system that does not support Direct I/O from all nodes, if running Windows Server 2012 R2. The common scenario here is a Scale-Out File Server backed by Mirrored Storage Spaces.

 

Here are the changes in SMB PowerShell to support SMB Scale-Out per-share redirection:

  • New “Redirected” Boolean property to Get-SmbConnection to indicate that per-share redirection is being used.
  • Get-SmbWitnessClient now includes a “ShareName” property, since witness can track connections per share, not only per server. 
  • Get-SmbWitnessClient now includes a “Flags” property, which will show “Share” when doing per-share redirection

 

3) Other SMB Witness Changes

 

While the overall functionality of the SMB Witness is largely unchanged in Windows Server 2012 R2 (aside from per-share redirection), we have put some effort in improving the SMB PowerShell cmdlets associated with it. Here are the changes:

  • The Move-SmbWitnessClient cmdlet can be referred to now simply as Move-SmbClient, This is simply a new alias that better describes what the cmdlet actually does.
  • The default view for Get-SmbWitnessClient was improved. Here’s the new list of items shown: 


Client Computer   Witness Node  File Server   Network Name  Share Name  Client State    
Name              Name             Node Name
----------------- ------------- ------------- ------------  ----------  ------------    
JOSE-V            JOSE-A1       JOSE-A2       JOSE-F        VMS4        RequestedNoti...
JOSE-V            JOSE-A1       JOSE-A2       JOSE-F        VMS3        RequestedNoti...
JOSE-V            JOSE-A1       JOSE-A3       JOSE-F        VMS1        RequestedNoti...
JOSE-V            JOSE-A1       JOSE-A3       JOSE-F        VMS2        RequestedNoti...
JOSE-V            JOSE-A2       JOSE-A1       JOSE-F        VMS6        RequestedNoti...
JOSE-V            JOSE-A2       JOSE-A1       JOSE-F        VMS5        RequestedNoti...

 

  • There is a new “NetworkName” parameter in Move-SmbClient.  If a NetworkName is specified, then only those will be moved. 

Windows Server 2012 syntax for Move-SmbWitnessClient:

    • Move-SmbWitnessClient -ClientName X -DestinationNode Y

Windows Server 2012 R2 syntax:

    • Move-SmbClient -ClientName X -DestinationNode Y [ -NetworkName Z ]

 

Note 1: If the –NetworkName is omitted in the Move-SmbClient cmdlet, all client connections will be moved to the destination.

 

Note 2: When using per-share redirection, the SMB client will always move to the file server node that owns the volume behind the file share. Using Move-SmbClient (or Move-SmbWitnessClient) in that situation has no effect.

 

4) SMB Bandwidth Limits

 

Starting with Windows Server 2012 R2, administrators can set a bandwidth limit for one or more categories by using simple PowerShell cmdlets. They can also query the limit for each category.  

 

There are 3 categories of traffic that can be limited:

  • VirtualMachine: Hyper-V over SMB traffic. This limit can only be applied on Hyper-V hosts. 
  • LiveMigration: Hyper-V Live Migration over SMB traffic. This limit can only be applied on Hyper-V hosts. 
  • Default: All other types of SMB traffic. This limit can be applied on any SMB client.

 

A category limit is an absolute value expressed in bytes per second.  For example, a category limit of 500MB means the total throughput for this category should not exceed 500 megabytes per second.

 

The PowerShell cmdlets used to manage SMB Bandwidth Limits are:

  • Get-SmbBandwidthLimit [ –Category {Default/VirtualMachine/LiveMigration} ]
  • Set-SmbBandwidthLimit –Category {Default/VirtualMachine/LiveMigration} –BytesPerSecond x 
  • Remove-SmbBandwidthLimit –Category {Default/ VirtualMachine/LiveMigration}

 

Note 1: Before you can use Bandwidth Limits, the feature needs to be enabled. For that, you should use:

  • Add-WindowsFeature FS-SMBBW

 

Note 2: A new SMB Performance counter set with an instance per category becomes available after you install the feature.  The performance counters in for this set will use the same counters used today for the SMB Client Shares counter.

 

Note 3: PowerShell allows us to specify units like KB, MB, GB after the number of bytes when specifying the BytesPerSecond parameter.

 

Note 4: The Set-SmbBandwidthLimit cmdlet won’t accept a BytesPerSecond parameter smaller than 1MB (1048576).

 

5) SMB Multi-instance

 

SMB Multi-instance is a new feature in Windows Server 2012 R2 that separates regular SMB traffic from CSV-related inter-node SMB traffic in distinct  SMB instances. This is designed to improve isolation between the two types of traffic in improve the reliability of the SMB servers. Information related to this new CSV-only instance in Windows Server 2012 R2 is hidden in all PowerShell cmdlets.

 

However, there are the changes in SMB PowerShell so an Administrator can view information related to the hidden CSV instance:

  • The –IncludeHidden option in Get-SmbConnection and Get-SmbMultichannelConnection will show the connections associated with the hidden CSV instance. 
  • There is now an InstanceName property in the full output of Get-SmbConnection, Get-SmbMultichannelConnection, Get-SmbSession and Get-SmbOpenFile. It shows either “Default” or “CSV” (only shows if using the –IncludeHidden option).

 

Note: There is really little use in inspecting the information on the hidden CSV instance, except if you’re troubleshooting CSV inter-node communications.

 

6) SMB Delegation

 

When you configure Hyper-V over SMB and you manage your Hyper-V hosts remotely using Hyper-V Manager, you will might run into access denied messages. This is because you’re using your credentials from the remote machine running Hyper-V Manager in the Hyper-V host to access a third machine (the file server). This is what we call a “double-hop”, and it’s not allowed by default for security reasons.The main problem with the scenario is that an intruder that compromises one computer in your environment could then connect to other systems in your environments without the need to provide a username and password. One way to work around this issue is to connect directly to the Hyper-V host and providing your credentials at that time, avoiding the double-hop.

 

You can also address this by configuring Constrained Delegation for SMB shares, which is a process that involves changing properties in Active Directory. The security risk is reduced here because a potential intruder double-hop would be limited to that specific use case (using SMB shares on the specified servers). The constrained delegation process was greatly simplified in Windows Server 2012 when the the Active Directory team introduced resource-based Kerberos constrained delegation, as explained at http://technet.microsoft.com/library/hh831747.aspx. However, even with  this new resource-based constrained delegation, there are still quite a few steps to enable it.

 

For Hyper-V over SMB in Windows Server 2012, we provided TechNet and blog-based guidance on how to automate constrained delegation. In Windows Server 2012 R2, SMB has a new set of cmdlets to simplify the configuration of resource-based constrained Delegation in SMB scenarios. Here are the new cmdlets introduced:

  • Get-SmbDelegation –SmbServer X
  • Enable-SmbDelegation –SmbServer X –SmbClient Y 
  • Disable-SmbDelegation –SmbServer X [–SmbClient Y] [-Force]

 

Note 1: For the Disable-SmbDelegation cmdlet, if no client is specified, delegation will be removed for all clients.

 

Note 2: These 3 new SMB cmdlets do rely on Active Directory PowerShell to perform their actions. For this reason, you need to install the Active Directory cmdlets before using the SMB delegation cmdlets. To install the Active Directory cmdlets, use:

  • Install-WindowsFeature RSAT-AD-PowerShell

 

Note 3: Because these cmdlets only work with the new resource-based delegation, the Active Directory forest must be in “Windows Server 2012” functional level. To check the Active Directory Forest Functional level, use:

  • Get-ADForest

 

Note 4: Virtual Machine Manager uses a different method to remote into the Hyper-V host. For VMM, constrained delegation is not required for management of Hyper-V of SMB.

 

7) Disable SMB1

 

SMB1 now can be completely disabled, so that the components are not even loaded. For scenarios where SMB1 is not required, this means less resource utilization, less need for patching and improved security.

 

To disable SMB1 completely, use the following PowerShell cmdlet:

  • Remove-WindowsFeature FS-SMB1

 

You can re-enable it by using:

  • Add-WindowsFeature FS-SMB1

 

Note 1: A reboot is required after this feature is enabled or disabled.

 

Note 2: For information worker scenarios, if you have Windows XP clients, you absolutely still need SMB1, since that is the only SMB version supported by Windows XP. Windows Vista and Windows 7 do not need SMB1 since they support SMB2. Windows 8 and Windows 8.1 do not need SMB1, since they support both SMB2 and SMB3.

 

Note 3: For classic server scenarios, if you have Windows Server 2003 or Windows Server 2003 R2 servers, you absolutely still need SMB1, since that is the only SMB version supported by them. Windows Server 2008 and Windows Server 2008 R2 do not need SMB1 since they support SMB2. Windows Server 2012 and Windows Server 2012 R2 do not need SMB1, since they support both SMB2 and SMB3.

 

Raw notes from the Storage Developers Conference (SDC 2013)

$
0
0

This blog post is a compilation of my raw notes from SNIA’s SDC 2013 (Storage Developers Conference).

Notes and disclaimers:

  • These notes were typed during the talks and they may include typos and my own misinterpretations.
  • Text in the bullets under each talk are quotes from the speaker or text from the speaker slides, not my personal opinion.
  • If you feel that I misquoted you or badly represented the content of a talk, please add a comment to the post.
  • I spent limited time fixing typos or correcting the text after the event. Just so many hours in a day...
  • I have not attended all sessions (since there are 4 or 5 at a time, that would actually not be possible :-)…
  • SNIA usually posts the actual PDF decks a few weeks after the event. Attendees have access immediately.
  • You can find the event agenda at http://www.snia.org/events/storage-developer2013/agenda2013

SMB3 Meets Linux: The Linux Kernel Client
Steven French, Senior Engineer SMB3 Architecture, IBM

  • Title showing is (with the strikethrough text): CIFSSMB2SMB2.1SMB3 SMB3.02 and Linux, a Status Update.
  • How do you use it? What works? What is coming?
  • Who is Steven French: maintains the Linux kernel client, at SMB3 Architect for IBM Storage
  • Excited about SMB3
  • Why SMB3 is important: cluster friendly, large IO sizes, more scalable.
  • Goals: local/remote transparency, near POSIX semantics to Samba, fast/efficient/full function/secure method, as reliable as possible over bad networks
  • Focused on SMB 2.1, 3, 3.02 (SMB 2.02 works, but lower priority)
  • SMB3 faster than CIFS. SMB3 remote file access near local file access speed (with RDMA)
  • Last year SMB 2.1, this year SMB 3.0 and minimal SMB 3.02 support
  • 308 kernel changes this year, a very active year. More than 20 developers contributed
  • A year ago 3.6-rc5 – now at 3.11 going to 3.12
  • Working on today copy offload, full linux xattr support, SMB3 UNIX extension prototyping, recover pending locks, starting work on Multichannel
  • Outline of changes in the latest releases (from kernel version 3.4 to 3.12), version by version
  • Planned for kernel 3.13: copy chunk, quota support, per-share encryption, multichannel, considering RDMA (since Samba is doing RDMA)
  • Improvements for performance: large IO sizes, credit based flow control, improved caching model. Still need to add compounding,
  • Status: can negotiate multiple dialects (SMB 2.1, 3, 3.02)
  • Working well: basic file/dir operations, passes most functional tests, can follow symlinks, can leverage durable and persistent handles, file leases
  • Need to work on: cluster enablement, persistent handles, witness, directory leases, per-share encryption, multichannel, RDMA
  • Plans: SMB 2.1 no longer experimental in 3.12, SMB 2.1 and 3 passing similar set of functional tests to CIFS
  • Configuration hints: adjusting rsize, wsize, max_pending, cache, smb3 signing, UNIX extension, nosharelock
  • UNIX extensions: POSIX pathnames, case sensitive path name, POSIX delete/rename/create/mkdir, minor extensions to stat/statfs, brl, xattr, symlinks, POSIX ACLs
  • Optional POSIX SMB3 features outlined: list of flags used for each capability
  • Question: Encryption: Considering support for multiple algorithms, since AES support just went in the last kernel.
  • Development is active! Would like to think more seriously about NAS appliances. This can be extended…
  • This is a nice, elegant protocol. SMB3 fits well with Linux workloads like HPC, databases. Unbelievable performance with RDMA.
  • Question: Cluster enablement? Durable handle support is in. Pieces missing for persistent handle and witness are small. Discussing option to implement and test witness.
  • Need to look into the failover timing for workloads other than Hyper-V.
  • Do we need something like p-NFS? Probably not, with these very fast RDMA interfaces…

Mapping SMB onto Distributed Storage
Christopher R. Hertel, Senior Principal Software Engineer, Red Hat
José Rivera, Software Engineer, Red Hat

  • Trying to get SMB running on top of a distributed file system, Gluster
  • Chris and Jose: Both work for RedHat, both part of the Samba team, authors, etc…
  • Metadata: data about data, pathnames, inode numbers, timestamps, permissions, access controls, file size, allocation, quota.
  • Metadata applies to volumes, devices, file systems, directories, shares, files, pipes, etc…
  • Semantics are interpreted in different contexts
  • Behavior: predictable outcomes. Make them the same throughout the environments, even if they are not exactly the same
  • Windows vs. POSIX: different metadata + different semantics = different behavior
  • That’s why we have a plugfest downstairs
  • Long list of things to consider: ADS, BRL, deleteonclose, directory change notify, NTFS attributes, offline ops, quota, etc…
  • Samba is a Semantic Translator. Clients expect Windows semantics from the server, Samba expects POSIX semantics from the underlying file system
  • UNIX extensions for SMB allows POSIX clients to bypass some of this translation
  • If Samba does not properly handle the SMB protocol, we call it a bug. If cannot handle the POSIX translation, it’s also a bug.
  • General Samba approach: Emulate the Windows behavior, translate the semantics to POSIX (ensure other local processes play by similar rules)
  • The Samba VFS layers SMB Protocol Initial Request Handling  VFS Layer  Default VFS Layer  actual file system
  • Gluster: Distributed File System, not a cluster file system. Brick  Directory in the underlying file system. Bricks bound together as a volume. Access via SMB, NFS, REST.
  • Gluster can be FUSE mounted. Just another access method. FUSE hides the fact that it’s Gluster underneath.
  • Explaining translations: Samba/Gluster/FUSE. Gluster is adaptable. Translator stack like Samba VFS modules…
  • Can add support for: Windows ACLs, oplocks, leases, Windows timestamps.
  • Vfs_glusterfs: Relatively new code, similar to other Samba VFS modules. Took less than a week to write.
  • Can bypass the lower VFS layers by using libgfapi. All VFS calls must be implemented to avoid errors.
  • CTDB offers three basics services: distributed metadata database (for SMB state), node failure detection/recovery, IP address service failover.
  • CTDB forms a Samba cluster. Separate from the underlying Gluster cluster. May duplicate some activity. Flexible configuration.
  • SMB testing, compared to other access methods: has different usage patterns, has tougher requirements, pushes corner cases.
  • Red Hat using stable versions, kernel 2.x or something. So using SMB1 still…
  • Fixed: Byte range locking. Fixed a bug in F_GETLK to get POSIX byte range locking to work.
  • Fixed:  SMB has strict locking and data consistency requirements. Stock Gluster config failed ping_pong test. Fixed cache bugs  ping_pong passes
  • Fixed: Slow directory lookups. Samba must do extra work to detect and avoid name collisions. Windows is case-INsensitive, POSIX is case-sensitive. Fixed by using vfs_glusterfs.
  • Still working on: CTDB node banning. Under heavy load (FSCT), CTDB permanently bans a running node. Goal: reach peak capacity without node banning. New CTDB versions improved capacity.
  • Still working on: CTDB recovery lock file loss. Gluster is a distributed FS, not a Cluster FS. In replicated mode, there are two copies of each file. If Recovery Lock File is partitioned, CTDB cannot recover.
  • Conclusion: If implementing SMB in a cluster or distributed environment, you should know enough about SMB to know where to look for trouble… Make sure metadata is correct and consistent.
  • Question: Gluster and Ceph have VFS. Is Samba suitable for that? Yes. Richard wrote a guide on how to write a VFS. Discussing a few issues around passing user context.
  • Question: How to change SMB3 to be more distributed? Client could talk to multiple nodes. Gluster working on RDMA between nodes. Protocol itself could offer more about how the cluster is setup.

Pike - Making SMB Testing Less Torturous
Brian Koropoff, Consulting Software Engineer, EMC Isilon

  • Pike – written in Python – starting with a demo
  • Support for a modest subset of SMB2/3. Currently more depth than breadth.
  • Emphasis on fiddly cases like failover, complex creates
  • Mature solutions largely in C (not convenient for prototyping)
  • Why python: ubiquitous, expressive, flexible, huge ecosystem.
  • Flexibility and ease of use over performance. Convenient abstractions. Extensible, re-usable.
  • Layers: core primitives (abstract data model), SMB2/3 packet definitions, SMB2/3 client model (connection, state, request, response), test harness
  • Core primitives: Cursor (buffer+offset indicating read/write location), frame (packet model), enums, anti-boilerplate magic. Examples.
  • SMB2/SMB3 protocol (pike.smb2) header, request/response, create {request/response} context, concrete frame. Examples.
  • SMB2/SMB3 model: SMB3 object model + glue. Future, client, connection (submit, trasceive, error handling), session, channel (treeconnect, create, read), tree, open, lease, oplocks.
  • Examples: Connect, tree connect, create, write, close. Oplocks. Leases.
  • Advanced uses. Manually construct and submit exotic requests. Override _encode. Example of a manual request.
  • Test harness (pike,test): quickly establish connection, session and tree connect to server. Host, credentials, share parameters taken from environment.
  • Odds and ends: NT time class, signing, key derivation helpers.
  • Future work: increase breadth of SMB2/3 support. Security descriptors, improvement to mode, NTLM story, API documentation, more tests!
  • http://github.com/emc-isilon/pike - open source, patches are welcome. Has to figure out how to accept contributions with lawyers…
  • Question: Microsoft has a test suite. It’s in C#, doesn’t work in our environment. Could bring it to the plugfest.
  • Question: I would like to work on implementing it for SMB1. What do you think? Not a priority for me. Open to it, but should use a different model to avoid confusion.
  • Example: Multichannel. Create a session, bind another channel to the same session, pretend failover occurred. Write fencing of stable write.

 Exploiting the High Availability features in SMB 3.0 to support Speed and Scale
James Cain, Principal Software Architect, Quantel Ltd

  • Working with TV/Video production. We only care about speed.
  • RESTful recap. RESTful filesystems talk from SDC 2010. Allows for massive scale by storing application state in the URLs instead of in the servers.
  • Demo (skipped due to technical issues): RESTful SMB3.
  • Filling pipes: Speed (throughput) vs. Bandwidth vs. Latency. Keeping packets back to back on the wire.
  • TCP Window size used to limit it. Mitigate by using multiple wires, multiple connections.
  • Filling the pipes: SMB1 – XP era. Filling the pipes required application participation. 1 session could do about 60MBps. Getting Final Cut Pro 7 to lay over SMB1 was hard. No choice to reduce latency.
  • Filling the pipes: SMB 2.0 – Vista era. Added credits, SMB2 server can control overlapped requests using credits. Client application could make normal requests and fill the pipe.
  • Filling the pipes: SMB 2.1 – 7 era. Large MTU helps.
  • Filling the pipes: SMB 3 – 8 era. Multi-path support. Enables: RSS, Multiple NICs, Multiple machines, RDMA.
  • SMB3 added lots of other features for high availability and fault tolerance. SignKey derivation.
  • Filesystem has DirectX GUI :-) - We use GPUs to render, so our SMB3 server has Cuda compute built in too. Realtime visualization tool for optimization.
  • SMB3 Multi-machine with assumed shared state. Single SMB3 client talking to two SMB3 servers. Distributed non-homogeneous storage behind the SMB servers.
  • Second NIC (channel) initiation has no additional CREATE. No distinction on the protocol between single server or multiple server. Assume homogeneous storage.
  • Asking Microsoft to consider “NUMA for disks”. Currently, shared nothing is not possible. Session, trees, handles are shared state.
  • “SMB2++” is getting massive traction. Simple use cases are well supported by the protocol. SMB3 has a high cost of entry, but lower than writing n IFS in kernel mode.
  • There are limits to how far SMB3 can scale due to its model.
  • I know this is not what the protocol is designed to do. But want to see how far I can go.
  • It could be help by changing the protocol to have duplicate handle semantics associated with the additional channels.
  • The protocol is really, really flexible. But I’m having a hard time doing what I was trying to do.
  • Question: You’re basic trying to do Multichannel  to multiple machines. Do you have a use case? I’m experimenting with it. Trying to discover new things.
  • Question: You could use CTDB to solve the problem. How much would it slow down? It could be a solution, not an awful lot of state.             

SMB3 Update
David Kruse, Development Lead, Microsoft

  • SMB 3.02 - Don’t panic! If you’re on the road to SMB3, there are no radical changes.
  • Considered not revving the dialect and doing just capability bits, but thought it would be better to rev the dialect.
  • Dialects vs. Capabilities: Assymetric Shares, FILE_ATTRIBUTE_INTEGRITY_STREAMS.
  • SMB 2.0 client attempting MC or CA? Consistency/documentation question.
  • A server that receives a request from a client with a flag/option/capability that is not valid for the dialect should ignore it.
  • Showing code on how to mask the capabilities that don’t make sense for a specific dialect
  • Read/Write changes: request specific flag for unbuffered IO. RDMA flag for invalidation.
  • Comparing “Traditional” File Server Cluster vs. “Scale-Out” File Server cluster
  • Outlining the asymmetric scale-out file server cluster. Server-side redirection. Can we get the client to the optimal node?
  • Asymmetric shares. New capability in the TREE_CONNECT response. Witness used to notify client to move.
  • Different connections for different shares in the same scale-out file server cluster. Share scope is the unit of resource location.
  • Client processes share-level “move” in the same fashion as a server-level “move” (disconnect, reconnects to IP, rebinds handle).
  • If the cost accessing the data is the same for all nodes, there is no need to move the client to another node.
  • Move-SmbWitnessClient will not work with asymmetric shares.
  • In Windows, asymmetric shares are typically associated with Mirrored Storage Spaces, not iSCSI/FC uniform deployment. Registry key to override.
  • Witness changes: Additional fields: Sharename, Flags, KeepAliveTimeOutInSeconds.
  • Witness changes: Multichannel notification request. Insight into arrival/loss of network interfaces.
  • Witness changes: Keepalive. Timeout for async IO are very coarse. Guarantees client and server discover lost peer in minutes instead of hours.
  • Demos in Jose’s blog. Thanks for the plug!
  • Diagnosability events. New always-on events. Example: failed to reconnect a persistent handle includes previous reconnect error and reason. New events on server and client.
  • If Asymmetric is not important to you, you don’t need to implement it.
  • SMB for IPC (Inter-process communications) – What happened to named pipes?
  • Named pipes over SMB has been declined in popularity. Performance concerns with serialized IO. But this is a property of named pipes, not SMB.
  • SMB provides: discovery, negotiation, authentication, authorization, message semantics, multichannel, RDMA, etc…
  • If you can abstract your application as a file system interface, you could extend it to removte via SMB.
  • First example: Remote Shared Virtual Disk Protocol
  • Second example: Hyper-V Live Migration over SMB. VID issues writes over SMB to target for memory pages. Leverages SMB Multichannel, SMB Direct.
  • Future thoughts on SMB for IPC. Not a protocol change or Microsoft new feature. Just ideas shared as a thought experiment.
    • MessageFs – User mode-client and user-mode server. Named Pipes vs. MessageFs. Each offset marks a distinct transaction, enables parallel actions.
    • MemFs – Kernel mode component on the server side. Server registers a memory region and clients can access that memory region.
    • MemFs+ - What if we combine the two? Fast exchange for small messages plus high bandwidth, zero copy access for large transfers. Model maps directly to RDMA: send/receive messages, read/write memory access.
  • One last thing… On Windows 8.1, you can actually disable SMB 1.0 completely.

Architecting Block and Object Geo-replication Solutions with Ceph
Sage Weil, Founder & CTO, Inktank

  • Impossible to take notes, speaker goes too fast :-)

1 S(a) 2 M 3 B(a) 4
Michael Adam, SerNet GmbH - Delivered by Volker

  • What is Samba? The open source SMB server (Samba3). The upcoming open source AD controller (Samba4). Two different projects.
  • Who is Samba? List of team members. Some 35 or so people… www.samba.org/samba/team
  • Development focus: Not a single concentrated development effort. Various companies (RedHat, SuSE, IBM, SerNet, …) Different interests, changing interests.
  • Development quality: Established. Autobuild selftest mechanism. New voluntary review system (since October 2012).
  • What about Samba 4.0 after all?
    • First (!?) open source Active Directory domain controller
    • The direct continuation of the Samba 3.6 SMB file server
    • A big success in reuniting two de-facto separated projects!
    • Also a big and important file server release (SMB 2.0 with durable handles, SMB 2.1 (no leases), SMB 3.0 (basic support)
  • History. Long slide with history from 2003-06-07 (Samba 3.0.0 beta 1) to 2012-12-11 (Samba 4.0.0). Samba4 switched to using SMB2 by default.
  • What will 4.1 bring? Current 4.1.0rc3 – final planned for 2013-09-27.
  • Samba 4.1 details: mostly stabilization (AD, file server). SMB2/3 support in smbclient, including SMB3 encryption. Server side copy. Removed SWAT.
  • Included in Samba 4.0: SMB 2.0 (durable handles). SMB 2.1 (multi-credit, large MTU, dynamic reauth), SMB 3.0 (signing, encryption, secure negotiate, durable handles v2)
  • Missing in Samba 4.0: SMB 2.1 (leasing*, resilient file handles), SMB 3.0 (persistent file handles, multichannel*, SMB direct*, witness*, cluster features, storage features*, …) *=designed, started or in progress
  • Leases: Oplocks done right. Remove 1:1 relationship between open and oplock, add lease/oplock key. http://wiki.samba.org/index.php/Samba3/SMB2#Leases
  • Witness: Explored protocol with Samba rpcclient implementation. Working on pre-req async RPC. http://wiki.samba.org/index.php/Samba3/SMB2#Witness_Notification_Protocol
  • SMB Direct:  Currently approaching from the Linux kernel side. See related SDC talk. http://wiki.samba.org/index.php/Samba3/SMB2#SMB_Direct
  • Multichannel and persistent handles: just experimentation and discussion for now. No code yet.

Keynote: The Impact of the NVM Programming Model
Andy Rudoff, Intel

  • Title is Impact of NVM Programming Model (… and Persistent Memory!)
  • What do we need to do to prepare, to leverage persistent memory
  • Why now? Programming model is decades old!
  • What changes? Incremental changes vs. major disruptions
  • What does this means to developers? This is SDC…
  • Why now?
  • One movements here: Block mode innovation (atomics, access hints, new types of trim, NVM-oriented operations). Incremental.
  • The other movement: Emerging NVM technologies (Performance, performance, perf… okay, Cost)
  • Started talking to companies in the industry  SNIA NVM Programming TWG - http://snia.org/forums/sssi/nvmp
  • NVM TWG: Develop specifications for new software “programming models”as NVM becomes a standard feature of platforms
  • If you don’t build it and show that it works…
  • NVM TWG: Programming Model is not an API. Cannot define those in a committee and push on OSVs. Cannot define one API for multiple OS platforms
  • Next best thing is to agree on an overall model.
  • What changes?
  • Focus on major disruptions.
  • Next generation scalable NVM: Talking about resistive RAM NVM options. 1000x speed up over NND, closer do DRAM.
  • Phase Change Memory, Magnetic Tunnel Junction (MT), Electrochemical Cells (ECM), Binary Oxide Filament Cells, Interfacial Switching
  • Timing. Chart showing NAND SATA3 (ONFI2, ONFI3), NAND PCIe Gen3 x4 ONFI3 and future NVM PCIE Gen3 x4.
  • Cost of software stack is not changing, for the last one (NVM PCIe) read latency, software is 60% of it?!
  • Describing Persistent Memory…
  • Byte-addressable (as far as programming model goes), load/store access (not demand-paged), memory-like performance (would stall a CPU load waiting for PM), probably DMA-able (including RDMA)
  • For modeling, think battery-backed RAM. These are clunky and expensive, but it’s a good model.
  • It is not tablet-like memory for the entire system. It is not NAND Flash (at least not directly, perhaps with caching). It is not block-oriented.
  • PM does not surprise the program with unexpected latencies (no major page faults). Does not kick other things out of memory. Does not use page cache unexpectedly.
  • PM stores are not durable until data is flushed. Looks like a bug, but it’s always been like this. Same behavior that’s been around for decades. It’s how physics works.
  • PM may not always stay in the same address (physically, virtually). Different location each time your program runs. Don’t store pointers and expect them to work. You have to use relative pointers. Welcome to the world of file systems…
  • Types of Persistent Memory: Battery-backed RAM. DRAM saved on power failure. NVM with significant caching. Next generation NVM (still quite a bit unknown/emerging here).
  • Existing use cases: From volatile use cases (typical) to persistent memory use case (emerging). NVDIMM, Copy to Flash, NVM used as memory.
  • Value: Data sets with no DRAM footprint. RDMA directly to persistence (no buffer copy required!). The “warm cache” effect. Byte-addressable. Direct user-mode access.
  • Challenges: New programming models, API. It’s not storage, it’s not memory. Programming challenges. File system engineers and database engineers always did this. Now other apps need to learn.
  • Comparing to the change that happened when we switched to parallel programming. Some things can be parallelized, some cannot.
  • Two persistent memory programming models (there are four models, more on the talk this afternoon).
  • First: NVM PM Volume mode. PM-aware kernel module. A list of physical ranges of NVMs (GET_RANGESET).
  • For example, used by file systems, memory management, storage stack components like RAID, caches.
  • Second: NVM PM File. Uses a persistent-memory-aware file system. Open a file and memory map it. But when you do load and store you go directly to persistent memory.
  • Native file APIs and management. Did a prototype on Linux.
  • Application memory allocation. Ptr=malloc(len). Simple, familiar interface. But it’s persistent and you need to have a way to get back to it, give it a name. Like a file…
  • Who uses NVM.PM.FILE. Applications, must reconnect with blobs of persistence (name, permissions)
  • What does it means to developers?
  • Mmap() on UNIX, MapViewOfFile() on Windows. Have been around for decades. Present in all modern operating systems. Shared or Copy-on-write.
  • NVM.PM.FILE – surfaces PM to application. Still somewhat raw at this point. Two ways: 1-Build on it with additional libraries. 2-Eventually turn to language extensions…
  • All these things are coming. Libraries, language extensions. But how does it work?
  • Creating resilient data structures. Resilient to a power failure. It will be in state you left it before the power failure. Full example: resilient malloc.
  • In summary: models are evolving. Many companies in the TWG. Apps can make a big splash by leveraging this… Looking forward to libraries and language extensions.

Keynote: Windows Azure Storage – Scaling Cloud Storage
Andrew Edwards, Microsoft

  • Turning block devices into very, very large block devices. Overview, architecture, key points.
  • Overview
  • Cloud storage: Blobs, disks, tables and queues. Highly durable, available and massively scalable.
  • 10+ trillion objects. 1M+ requests per seconds average. Exposed via easy and open REST APIs
  • Blobs: Simple interface to retrieve files in the cloud. Data sharing, big data, backups.
  • Disks: Built on top on blobs. Mounted disks as VHDs stored on blobs.
  • Tables: Massively scalable key-value pairs. You can do queries, scan. Metadata for your systems.
  • Queues: Reliable messaging system. Deals with failure cases.
  • Azure is spread all over the world.
  • Storage Concepts: Accounts  ContainerBlobs/TableEntities/QueuesMessages. URLs to identify.
  • Used by Microsoft (XBOX, SkyDrive, etc…) and many external companies
  • Architecture
  • Design Goals: Highly available with strong consistency. Durability, scalability (to zettabytes). Additional information in the SOSP paper.
  • Storage stamps: Access to blog via the URL. LB  Front-end  Partition layer  DFS Layer. Inter-stamp partition replication.
  • Architecture layer: Distributed file system layer. JBODs, append-only file system, each extent is replicated 3 times.
  • Architecture layer: Partition layer. Understands our data abstractions (blobs, queues, etc). Massively scalable index. Log Structure Merge Tree. Linked list of extents
  • Architecture layer: Front-end layer. REST front end. Authentication/authorization. Metrics/logging.
  • Key Design Points
  • Availability with consistency for writing. All writes we do are to a log. Append to the last extent of the log.
  • Ordered the same across all 3 replicas. Success only if 3 replicas are commited. Extents get sealed (no more appends) when they get to a certain size.
  • If you lose a node, seal the old two copies, create 3 new instances to append to. Also make a 3rd copy for the old one.
  • Availability with consistency for reading. Can read from any replica. Send out parallel read requests if first read is taking higher than 95% latency.
  • Partition Layer: spread index/transaction processing across servers. If there is a hot node, split that part of the index off. Dynamically load balance. Just the index, this does not move the data.
  • DFS Layer: load balancing there as well. No disk or node should be hot. Applies to both reads and writes. Lazily move replicas around to load balancing.
  • Append only system. Benefits: simple replication, easier diagnostics, erasure coding, keep snapshots with no extra cost, works well with future dirve technology. Tradeoff: GC overhead.
  • Our approach to the CAP theorem. Tradeoff in Availability vs. Consistency. Extra flexibility to achieve C and A at the same time.
  • Lessons learned: Automatic load balancing. Adapt to conditions. Tunable and extensible to tune load balancing rules. Tune based on any dimension (CPU, network, memory, tpc, GC load, etc.)
  • Lessons learned: Achieve consistently low append latencies. Ended up using SSD journaling.
  • Lessons learned: Efficient upgrade support. We update frequently, almost consistently. Handle them almost as failures.
  • Lessons learned: Pressure point testing. Make sure we’re resilient despite errors.
  • Erasure coding. Implemented at the DFS Layer. See last year’s SDC presentation.
  • Azure VM persistent disks: VHDs for persistent disks are directly stored in Windows Azure Storage blobs. You can access your VHDs via REST.
  • Easy to upload/download your own VHD and mount them. REST writes are blocked when mounted to a VM. Snapshots and Geo replication as well.
  • Separating compute from storage. Allows them to be scaled separately. Provide flat network storage. Using a Quantum 10 network architecture.
  • Summary: Durability (3 copies), Consistency (commit across 3 copies). Availability (can read from any of the 3 relicas). Performance/Scale.
  • Windows Azure developer website: http://www.windowsazure.com/en-us/develop/net
  • Windows Azure storage blog: http://blogs.msdn.com/b/windowsazurestorage
  • SOSP paper/talk: http://blogs.msdn.com/b/windowsazure/archive/2011/11/21/windows-azure-storage-a-highly-available-cloud-storage-service-with-strong-consistency.aspx

SMB Direct update
Greg Kramer, Microsoft
Tom Talpey, Microsoft

  • Two parts: 1 - Tom shares Ecosystem status and updates, 2 - Greg shares SMB Direct details
  • Protocols and updates: SMB 3.02 is a minor update. Documented in MS-SMB2 and MS-SMBD. See Dave's talk yesterday.
  • SMB Direct specifies the SMB3 RDMA transport, works with both SMB 3.0 and SMB 3.02
  • Windows Server 2012 R2 – GA in October, download from MSDN
  • Applications using SMB3 and SMB Direct: Hyper-V VHD, SQL Server
  • New in R2: Hyper-V Live Migration over SMB, Shared VHDX (remote shared virtual disk, MS-RSVD protocol)
  • RDMA Transports: iWARP (IETF RDMA over TCP), InfiniBand, RoCE. Ethernet: iWARP and RoCE – 10 or 40GbE, InfiniBand: 32Gbps (QDR) or 54Gbps (FDR)
  • RDMA evolution: iWARP (IETF standard, extensions currently active in IETF). RoCE (routable RoCE to improve scale, DCB deployment still problematic). InfiniBand (Roadmap to 100Gbps, keeping up as the bandwidth/latency leader).
  • iWARP: Ethernet, routable, no special fabric required, Up to 40GbE with good latency and full throughput
  • RoCE: Ethernet, not routable, requires PFC/DCB, Up to 40GbE with good latency and full throughput
  • InfinBand: Specialized interconnect, not routable, dedicated fabric and switching, Up to 56Gbps with excellent latency and throughput
  • SMB3 Services: Connection management, authentication, multichannel, networking resilience/recovery, RDMA, File IO Semantics, control and extension semantics, remote file system access, RPC
  • The ISO 7-layer model: SMB presents new value as a Session layer (RDMA, multichannel, replay/recover). Move the value of SMB up the stack.
  • SMB3 as a session layer: Applications can get network transparency, performance, recovery, protection (signing, encryption, AD integration). Not something you see with other file systems or file protocols.
  • Other: Great use by clustering (inter-node communication), quality of service, cloud deployment
  • In summary. Look to SMB for even broader application (like Hyper-V Live Migration did). Broader use of SMB Direct. Look to see greater application “fidelity” (sophisticated applications transparently server by SMB3)
  • Protocol enhancements and performance results
  • Where can we reduce IO costs? We were extremely happy about performance, there was nothing extremely easy to do next, no low-hanging fruit.
  • Diagram showing the App/SMB client/Client RNIC/Server RNIC. How requests flow in SMB Direct.
  • Interesting: client has to wait for the invalidation completion. Invalidation popped up as an area of improvement. Consumes cycles, bus. Adds IO, latency. But it’s required.
  • Why pend IO until invalidation is completed? This is storage, we need to be strictly correct. Invalidation guarantees: data is in a consistent state after DMA, peers no longer has access.
  • Registration caches cannot provides these guarantees, leading to danger of corruption.
  • Back to the diagram. There is a way to decorate a request with the invalidation  Send and Invalidate. Provides all the guarantees that we need!
  • Reduces RNIC work requests per IO by one third for high IOPs workload. That’s huge! Already supported by iWARP/RoCE/InfiniBand
  • No changes required at the SMB Direct protocol. Minor protocol change in SMB 3.02 to support invalidation. New channel value in the SMB READ and SMB WRITE.
  • Using Send and Invalidate (Server). Only one invalidate per request, have to be associated with the request in question. You can leverage SMB compounding.
  • Only the first memory descriptor in the SMB3 read/write array may be remotely invalidated. Keeping it simple.
  • Using Send and Invalidate (Client). Not a mandate, you can still invalidate “manually” if not using remote invalidate. Must validate that the response matches.
  • Performance Results (drumroll…)
  • Benchmark configuration: Client and Server config: Xeon E5-2660. 2 x ConnectX-3 56Gbps InfiniBand. Shunt filter in the IO path. Comparing WS2012 vs. WS2012 R2 on same hardware.
  • 1KB random IO. Uses RDMA send/receive path. Unbuffered, 64 queue depth.
    • Reads: 881K IOPs. 2012 R2 is +12.5% over 2012. Both client and server CPU/IO reduced (-17.3%, -36.7%)
    • Writes: 808K IOPs. 2012 R2 is +13.5% over 2012. Both client and server CPU/IO reduced (-16%, -32.7%)
  • 8KB random IO. Uses RDMA read/writes. Unbuffered, 64 queue depth.
    • Reads: 835K IOPs. 2012 R2 is +43.3% over 2012. Both client and server CPU/IO reduced (-37.1%, -33.2%)
    • Writes: 712K IOPs. 2012 R2 is +30.2% over 2012. Both client and server CPU/IO reduced (-26%, -14.9%)
  • 512KB sequential IO. Unbuffered, 12 queue depth. Already maxing out before. Remains awesome. Minor CPU utilization decrease.
    • Reads: 11,366 MBytes/sec. 2012 R2 is +6.2% over 2012. Both client and server CPU/IO reduced (-9.3%, -14.3%)
    • Writes: 11,412 MBytes/sec: 2012 R2 is +6% over 2012. Both client and server CPU/IO reduced (-12.2%, -10.3%)
  • Recap: Increased IOPS (up to 43%) and high bandwidth. Decrease CPU per IO (up to 36%).
  • Client has more CPU for applications. Server scales to more clients.
  • This includes other optimization in both the client in the server. NUMA is very important.
  • No new hardware required. No increase number of connections, MRs, etc.
  • Results reflect the untuned, out-of-the-box customer experience.
  • One more thing… You might be skeptical, especially about the use of shunt filter.
  • We never get to see this in our dev environment, we don’t have the high end gear. But...
  • Describing the 3U Violin memory array running Windows Server in a clustered configuration. All flash storage. Let’s see what happens…
  • Performance on real IO going to real, continuously available storage:
    • 100% Reads – 4KiB: >1Million IOPS
    • 100% Reads – 8KiB: >500K IOPS
    • 100% Writes – 4KiB: >600K IOPS
    • 100% Writes – 8KiB: >300K IOPS
  • Questions?

A Status Report on SMB Direct (RDMA) for Samba
Richard Sharpe, Samba Team Member, Panzura

  • I work at Panzura but has been done on my weekends
  • Looking at options to implement SMB Direct
  • 2011 – Microsoft introduced SMB direct at SDC 2011. I played around with RDMA
  • May 2012 – Tutorial on SMB 3.0 at Samba XP
  • Mellanox supplied some IB cards to Samba team members
  • May 2013 – More Microsoft presentations with Microsoft at Samba XP
  • June 2013 – Conversations with Mellanox to discuss options
  • August 2013 – Started circulating a design document
  • Another month or two before it’s hooked up with Samba.
  • Relevant protocol details: Client connections via TCP first (port 445). Session setup, connects to a share. Queries network interfaces. Place an RDMA Connection to server on port 5445, brings up SMB Direct Protocol engine
  • Client sends negotiate request, Dialect 0x300, capabilities field. Server Responds.
  • Diagram with SMB2 spec section 4.8 has an example
  • SMB Direct: Small protocol - Negotiate exchange phase, PDU transfer phase.
  • Structure of Samba. Why did it take us two years? Samba uses a fork model. Master smbd forks a child. Easy with TCP. Master does not handle SMB PDUs.
  • Separate process per connection. No easy way to transfer connection between them.
  • Diagram with Samba structure. Problem: who should listen on port 5445? Wanted RDMA connection to go to the child process.
  • 3 options:
  • 1 - Convert Samba to a threaded model, everything in one address space. Would simplify TCP as well. A lot of work… Presents other problems.
  • 2 - Separate process to handle RDMA. Master dmbd, RDMA handler, multiple child smbd, shared memory. Layering violation! Context switches per send/receive/read/write. Big perf hit.
  • 3 - Kernel driver to handle RDMA. Smbdirect support / RDMA support (rdmacm, etc) / device drivers. Use IOCTLs. All RDMA work in kernel, including RDMA negotiate on port 5445. Still a layering violation. Will require both kernel and Samba knowledge.
  • I decided I will follow this kernel option.
  • Character mode device. Should be agnostic of the card used. Communicate via IOCTLs (setup, memory params, send/receive, read/write).
  • Mmap for RDMA READ and RDMA WRITE. Can copy memory for other requests. Event/callback driven. Memory registration.
  • The fact that is looks like a device driver is a convenience.
  • IOCTLs: set parameters, set session ID, get mem params, get event (includes receive, send complete), send pdu, rdma read and write, disconnect.
  • Considering option 2. Doing the implementation of option 3 will give us experience and might change later.
  • Amortizing the mode switch. Get, send, etc, multiple buffers per IOCTL. Passing an array of objects at a time.
  • Samba changes needed….
  • Goals at this stage: Get something working. Allow others to complete. It will be up on github. Longer term: improve performance with help from others.
  • Some of this work could be used by the SMB client
  • Status: A start has been made. Driver loads and unloads, listens to connections. Working through the details of registering memory. Understand the Samba changes needed.
  • Weekend project! http://github.com/RichardSharpe/smbdirect-driver
  • Acknowledgments: Microsoft. Tom Talpey. Mellanox. Or Gerlitz. Samba team members.

CDMI and Scale Out File System for Hadoop
Philippe Nicolas, Scality

  • Short summary of who is Scality. Founded 2009. HQ in SF. ~60 employees, ~25 engineers in Paris. 24x7 support team. 3 US patents. $35Min 3 rounds.
  • Scality RING. Topology and name of the product. Currently in the 4.2 release. Commodity servers and storage. Support 4 LINUX distributions. Configure Scality layer. Create a large pool of storage.
  • Ring Topology. End-to-end Paralelism. Object Storage. NewSQL DB. Replication. Erasure coding. Geo Redundancy. Tiering. Multiple access methods (HTTP/REST, CDMI, NFS, CIFS, SOFS). GUI/CLI management.
  • Usage: e-mail, file storage, StaaS, Digital Media, Big Data, HPC
  • Access methods: APIs: RS2 (S3 compatible API), Sproxyd, RS2 light, SNIA CDMI. File interface: Scality Scale Out File System (SOFS), NFS, CIFS, AFP, FTP. Hadoop HDFS. OpenStack Cinder (since April 2013).
  • Parallel network file system. Limits are huge – 2^32 volumes (FS), 2^24 namespaces, 2^64 files. Sparse files. Aggregated throughput, auto-scaling with storage or access node addition.
  • CDMI (path and ID based access). Versions 1.0.1., 1.0.2. On github. CDMI client java library (CaDMIum), set of open source filesystem tools. On github.
  • Apache Hadoop. Transform commodity hard in data storage service. Largely supported by industry and end user community. Industry adoption: big names adopting Hadoop.
  • Scality RING for Hadoop. Replace HDFS with the Scality FS. We validate Hortonworks and Cloudera. Example with 12 Hadoop nodes for 12 storage nodes. Hadoop task trackers on RING storage nodes.
  • Data compute and storage platform in ONE cluster. Scality Scale Out File System (SOFS)  instead of HDFS. Advanced data protection (data replication up to 6 copies, erasure coding). Integration with Hortonworks HDP 1.0 & Cloudera CDH3/CDH4. Not another Hadoop distribution!
  • Summary: This is Open Cloud Access: access local or remotely via file and block interface. Full CDMI server and client. Hadoop integration (convergence approach). Comprehensive data storage platform.

Introduction to HP Moonshot
Tracy Shintaku, HP

  • Today’s demands – pervasive computing estimates. Growing internet of things (IoT).
  • Using SoC technologies used in other scenarios for the datacenter.
  • HP Moonshot System. 4U. World’s first low-energy software-defined server. HP Moonshot 1500 Chassis.
  • 45 individually serviceable hot-plug artrdges. 2 network switches, private fabric. Passive base plane.
  • Introducing HP Proliant Moonshot Server (passing around the room). 2000 of these servers in a rack. Intel Atom S1260 2GHz, 8GB DDR ECC 1333MHz. 500GB or 1TBHDD or SSD.
  • Single server = 45 servers per chassis. Quad-server = 180 servers per chassis. Compute, storage or combination. Storage cartridges with 2 HDD shared by 8 servers.
  • Rear view of the chassis. Dual 4QSFP network uplinks each with 4 x 40GB), 5 hot-plug fans, Power supplies, management module.
  • Ethernet – traffic isolation and stacking for resiliency with dual low-latency switches. 45 servers  dual switches  dual uplink modules.
  • Storage fabric. Different module form factors allow for different options: Local storage. Low cost boot and logging. Distributed storage and RAID. Drive slices reduce cost of a boot drive 87%.
  • Inter-cartridge private 2D Taurus Ring – available in future cartridges. High speed communication lanes between servers. Ring fabric, where efficient localized traffic is benefitial.
  • Cartridge roadmap. Today to near future to future. CPU: Atom  Atom, GPU, DSP, x86, ARM. Increasing variety of workloads static web servers now to hosting, financial servers in the future.
  • Enablement: customer and partner programs. Partner program. Logo wall for technology partners. Solution building program. Lab, services, consulting, financing.
  • Partners include Redhat, Suse, Ubuntu, Hortonworks, MapR, Cloudera, Couchbase, Citrix, Intel, AMD, Calxeda, Applied Micro, TI, Marvell, others. There’s a lot of commonality with OpenStack.
  • Web site: http://h17007.www1.hp.com/us/en/enterprise/servers/products/moonshot/index.aspx 

NFS on Steroids: Building Worldwide Distributed File System
Gregory Touretsky, Intel

  • Intel is a big organization, 6500 Its @ 59 sites, 95,200 employes, 142000 devices.
  • Every employee doing design has a Windows machines, but also interact with NFS backend
  • Remote desktop in interactive pool. Talks to NFS file servers, glued together with name spaces.
  • Large batch pools that do testing. Models stored in NFS.
  • Various application servers, running various systems. Also NIS, Cron servers, event monitors, configuration management.
  • Uses Samba to provide access to NFS file servers using SMB.
  • Many sites, many projects. Diagram with map of the world and multiple projects spanning geographies.
  • Latency between 10s t 100s ms. Bandwidth: 10s Mbps to 10s of Gbpsp.
  • Challenge: how to get to our customers and provide the ability to collaborate across the globe in a secure way
  • Cross-site data access in 2012. Multiple file servers, each with multiple exports. Clients access servers on same site. Rsync++ for replication between sites (time consuming).
  • Global user and group accounts. Users belong to different groups in different sites.
  • Goals: Access every file in the world from anywhere, same path, fault tolerant, WAN friendly, every user account and group on every site, Local IO performance not compromised.
  • Options: OpenAFS (moved out many years ago, decided not to got back). Cloud storage (concern with performance). NFS client-side caching (does not work well, many issues). WAN optimization (have some in production, help with some protocols, but not suitable for NFS). NFS site-level caching (proprietary and open source NFS Ganesha). In house (decided not to go there).
  • Direct NFS mount over WAN optimized tunnel. NFS ops terminated at the remote site, multiple potential routes, cache miss. Not the right solution for us.
  • Select: Site-level NFS caching and Kerberos. Each site has NFS servers and Cache servers. Provides instant visibility and minimizes amount of data transfer across sites.
  • Cache is also writable. Solutions with write-through and write-back caching.
  • Kerberos authentication with NFS v3. There are some problems there.
  • Cache implementations: half a dozen vendors, many not suitable for WAN. Evaluating alternatives.
  • Many are unable to provide a disconnected mode of operation. That eliminated a number of vendors.
  • Consistency vs. performance. Attribute cache timeout. Nobody integrates this at the directory level. Max writeback delay.
  • Optimizations. Read cache vs. Read/Write cache, delegations, proactive attribute validation for hot files, cache pre-population.
  • Where is it problematic? Application is very NFS unfriendly, does not work well with caching. Some cases it cannot do over cache, must use replication.
  • Problems: Read once over high latency link. First read, large file, interactive work. Large % of non-cacheable ops (write-through). Seldom access, beyond cache timeout.
  • Caching is not a business continuity solution. Only a partial copy of the data.
  • Cache management implementation. Doing it at scale is hard. Nobody provides a solution that fits our needs.
  • Goal: self-service management for data caching. Today administrators are involved in the process.
  • Use cases: cache my disk at site X, modify cache parameters, remove cache, migrate source/cache, get cache statistics, shared capacity management, etc.
  • Abstract the differences between the different vendors with this management system.
  • Management system example: report with project, path (mount point), size, usage, cached cells. Create cell in specific site for specific site.
  • Cache capacity planning. Goal: every file to be accessible on-demand everywhere.
  • Track cache usage by org/project. Shared cache capacity, multi-tenant. Initial rule of thumb: 7-10% of the source capacity, seeding capacity at key locations
  • Usage models: Remote validation (write once, read many). Get results back from remote sites (write once, read once). Drop box (generate in one site, get anywhere). Single home directory (avoid home directory in every site for every user, cache remote home directories). Quick remote environment setup, data access from branch location.
  • NFS (RPC) Authentication. Comparing AUTH_SYS and RPCSEC_GSS (KRB5). Second one uses an external KDC, gets past the AUTH_SYS limitation of 16 group IDs.
  • Bringing Kerberos? Needs to make sure this works as well as Windows with Active Directory. Need to touch everything (Linux client, NFS file servers, SSH, batch scheduler, remote desktop/interactive servers, name space/automounter (trusted hosts vs. regular hosts), Samba (used as an SMB gateway to NFS), setuid/sudo, cron jobs and service accounts (keytab management system),
  • Supporting the transition from legacy mounts to Kerberos mount. Must support a mixed environment. Introducing second NIS domain.
  • Welcome on board GDA airlines (actual connections between different sites). Good initial feedback from users (works like magic!)
  • Summary: NFS can be accessed over WAN – using NFS caching proxy. NFSv3 environment can be kerberized (major effort is required, transition is challenging, it would be as challenging for NFSv5/KRB)

Forget IOPS: A Proper Way to Characterize & Test Storage Performance
Peter Murray, SwiftTest

  • About what we learned in the last few years
  • Evolution: Vendor IOPs claims, test in production and pray, validate with freeware tools (iometer, IOZone), validate with workload models
  • What is storage validation? Characterize the various applications, workloads. Diagram: validation appliance, workload emulations, storage under test.
  • Why should you care? Because customers do care! Product evaluations, vendor bakeoffs, new feature and technology evaluations, etc…
  • IOPS: definition from the SNIA dictionary. Not really well defined. One size does not fit all. Looking at different sizes.
  • Real IO does not use a fixed size. Read/write may be a small portion of it in certain workloads. RDMA read/write may erode the usefulness of isolated read/write.
  • Metadata: data about data. Often in excess of 50%, sometimes more than 90%. GoDaddy mentioned that 94% of workloads are not read/write.
  • Reducing metadata impact: caching with ram, flash, SSD helps but it’s expensive.
  • Workloads: IOPS, metadata and your access pattern. Write/read, random/sequential, IO/metadata, block/chunk size, etc.
  • The important of workloads: Understand overload and failure conditions. Understand server, cluster, deduplication, compression, network configuration and conditions
  • Creating and understanding workloads. Access patterns (I/O mix: read/write%, metadata%) , file system (depth, files/folder, file size distribution), IO parameters (block size, chunk size, direction), load properties (number of users, actions/second, load variability/time)
  • Step 1 - Creating a production model. It’s an art, working to make it a science. Production stats + packet captures + pre-built test suites = accurate, realistic work model.
  • Looking at various workload analysis
  • Workload re-creation challenges: difficult. Many working on these. Big data, VDI, general VM, infinite permutations.
  • Complex workloads emulation is difficult and time consuming. You need smart people, you need to spend the time.
  • Go Daddy shared some of the work on simulation of their workload. Looking at diagram with characteristics of a workload.
  • Looking at table with NFSv3/SMB2 vs. file action distribution.
  • Step 2: Run workload model against the target.
  • Step 3: Analyze the results for better decisions. Analytics leads to Insight. Blocks vs. file. Boot storm handing, limits testing, failure modes, effects of flash/dedup/tiering/scale-out.
  • I think we’ll see dramatic changes with the use of Flash. Things are going to change in the next few years.
  • Results analysis: Performance. You want to understand performance, spikes during the day, what causes them. Response times, throughput.
  • Results analysis: Command mix. Verify that the execution reflects the expected mix. Attempts, successes, errors, aborts.
  • Summary: IOPs alone cannot characterize real app storage performance. Inclusion of metadata is essential, workload modeling and purpose-build load generation appliances are the way to emulate applications. The more complete the emulation, the deeper the understanding.
  • If we can reduce storage cost from 40% to 20% of the solution by better understanding the workload, you can save a lot of money.

pNFS, NFSv4.1, FedFS and Future NFS Developments
Tom Haynes, NetApp

  • Tom covering for Alex McDonald, who is sick. His slides.
  • We want to talk about how the protocol get defined, how it interfact with different application vendors and customers.
  • Looking at what is happening on the Linux client these days.
  • NFS: Ubiquitous and everywhere. NFSv3 is very successful, we can’t dislodge it. We though everyone would go for NFSv4 and it’s now 10 years later…
  • NFSv2 in 1983, NFSv3 in 1995, NFSv4 in 2003, NFSv4.1 in 2010. NFSv4.2 to be agreed at the IETF – still kinks in the protocol that need to be ironed out. 2000=DAS, 2010=NAS, 2020=Scale-Out
  • Evolving requirements. Adoption is slow. Lack of clients was a problem with NFSv4. NFSv3 was just “good enough”. (It actually is more than good enough!)
  • Industry is changing, as are requirements. Economic trends (cheap and fast cluster, cheap and fast network, etc…)
  • Performance: NFSv3 single threaded bottlenecks in applications (you can work around it).
  • Business requirements. Reliability (sessions) is a big requirement
  • NFSv4 and beyond.
  • Areas for NFSv4, NFSv4.1 and pNFS: Security, uniform namespaces, statefulness/sessions, compound operations, caching (directory and file delegations) parallelization (layout and pFNS)
  • Future NFSv4.2 and FedFS (Global namespace; IESG has approved Dec 2012)
  • NFSv4.1 failed to talk to the applications and customers and ask what they needed. We did that for NFSv4.2
  • Selecting the application for NFSv4.1, planning, server and client availability. High level overview
  • Selecting the parts: 1 – NFSv4.1 compliant server (Files, blocks or objects?), 2-compliant client. The rise of the embedded client (Oracle, VMware). 3 – Auxiliary tools (Kerberos, DNS, NTP, LDAP). 4 – If you can, use NFS v4.1 over NFSv4.
  • If you’re implementing something today, skip NFS v4 and go straight to NFS v4.1
  • First task: select an application: Home directories, HPC applications.
  • Don’t select: Oracle (use dNFS built in), VMware and other virtualization tools (NFSv3). Oddball apps that expect to be able to internally manage NFSv3 “maps”. Any application that required UDP, since v4.1 doesn’t support anything but TCP.
  • NSFv4 stateful clients. Gives client independence (client has state). Allows delegation and caching. No automounter required, simplified locking
  • Why? Compute nodes work best with local data, NFSv4 eliminates the need for local storage, exposes more of the backed storage functionality (hints), removes stale locks (major source of NFSv3 irritation)
  • NFSv4.1 Delegations. Server delegates certain responsabilities to the client (directory and file, caching). Read and write delegation. Allows clients to locally service operations (open, close, lock, etc.)
  • NFSv4.1 Sessions. In v3, server never knows if client got the reply message. In v4.1, sessions introduced.
  • Sessions: Major protocol infrastructure change. Exactly once semantics (EOS), bounded size of reply cache. Unlimited parallelism. Maintains server’s state relative to the connections belonging to a client.
  • Use delegation and caching transparently; client and server provide transparency. Session lock clean up automatically.
  • NFSv4 Compound operations – NFSv3 protocol can be “chatty”, unsuitable for WANs with poor latency. Typical NFSv3: open, read & close a file. Compounds many operations into one to reduce wire time and simple error recovery.
  • GETATTR is the bad boy. We spent 10 years with the Linux client to get rid of many of the GETATTR (26% of SPECsfs2008).
  • NFSv4 Namespace. Uniform and “infinite” namespace. Moving from user/home directories to datacenter and corporate use. Meets demand for “large scale” protocol. Unicode support for UTF-8 codepoints. No automounter required (simplifies administration). Pseudo-file system constructed by the server.
  • Looking at NFSv4 Namespace example. Consider the flexibility of pseudo-filesystems to permit easier migration.
  • NFSv4 I18N Directory and File Names. Uses UTF-8, check filenames for compatibility, review filenames for compatibility. Review existing NFSv3 names to ensure they are 7-bit ASCII clean.
  • NFSv4 Security. Strong security framework. ACLs for security and Windows compatibility. Security with Kerberos. NFSv4 can be implemented without Kerberos security, but not advisable.
  • Implementing without Kerberos (no security is a last resort!). NFSv4 represents users/groups as strings (NFSv3 used 32-bit integers, UID/GID). Requires UID/GID to be converted to all numeric strings.
  • Implementing with Kerberos. Find a security expert. Consider using Windows AD Server.
  • NFSv4 Security. Firewalls. NFSv4 ha no auxiliary protocols. Uses port 2049 with TCP only. Just open that port.
  • NFSv4 Layouts. Files, objects and block layouts. Flexibility for storage that underpins it. Location transparent. Layouts available from various vendors.
  • pNFS. Can aggregate bandwidth. Modern approach, relieves issues associated with point-to-point connections.
  • pNFS Filesystem implications.
  • pNFS terminology. Important callback mechanism to provide information about the resource.
  • pNFS: Commercial server implementations. NetApp has it. Panasas is the room as well. Can’t talk about other vendors…
  • Going very fast through a number of slides on pNFS: NFS client mount, client to MDS, MDS Layout to NFS client, pNFS client to DEVICEINFO from MDS,
  • In summary: Go adopt NFS 4.1, it’s the greatest thing since sliced bread, skip NFS 4.0
  • List of papers and references. RFCs: 1813 (NFSv3), 3530 (NFSv4), 5661 (NFSv4.1), 5663 (NFSv4.1 block layout), 5664 (NFSv4.1 object layout)

pNFS Directions / NFSv4 Agility
Adam Emerson, CohortFS, LLC

  • Stimulate discussion about agility as a guiding vision for future protocol evaluation
  • NFSv4: A standard file acces/storage protocol, that is agile
  • Incremental advances shouldn’t require a new access protocol. Capture more value from the engineering already done. Retain broad applicability, yet adapt quickly to new challenges/opportunities
  • NFSv4 has delivered (over 10+ years of effort) on a set of features designers had long aspired to: atomicity, consistency, integration, referrals, single namespaces
  • NFSv4 has sometimes been faulted for delivering slowly and imperfect on some key promises: flexible and easy wire security , capable and interoperable ACLs, RDMA acceleration
  • NFSv4 has a set of Interesting optional features not widely implemented: named attributes, write delegations, directory delegations, security state verifier, retention policy
  • Related discussion in the NFSv4 Community (IETF): The minor version/extension debate: de-serializing independent, potentially parallel extension efforts, fixing defect in prior protocol revisions, rationalizing past and future extension mechanisms
  • Related discussion in the NFSv4 Community (IETF): Extensions drafts leave my options open, but prescribes: process to support development of new features proposals in parallel, capability negotiation, experimentation
  • Embracing agility: Noveck formulation is subtle: rooted in NFS and WG, future depends on participants, can encompass but perhaps does not call out for an agile future.
  • Capability negotiation and experimental codepoint ranges strongly support agility. What we really want is a model that encourages movement of features from private experimentation to shared experimentation to standardization.
  • Efforts promoting agility: user-mode (and open source) NFSv4 servers (Ganesha, others?) and clients (CITI Windows NFSv4.1 client, library client implementations)
  • Some of the people in the original CITI team now working with us and are continuing to work on it
  • library client implementations: Allow novel semantics and features like pre-seeding of files, HPC workloads, etc.
  • NFSv4 Protocol Concepts promoting agility: Not just new RPCs and union types.
  • Compound: Grouping operations with context operations. Context evolves with operations and inflects the operations. It could be pushed further…
  • Named Attributes: Support elaboration of conventions and even features above the protocol, with minimal effort and coordination. Subfiles, proplists. Namespace issues: System/user/other, non-atomicity, not inlined.
  • Layout: Powerful structuring concept carrying simplified transaction pattern. Typed, Operations carry opaque data nearly everywhere, application to data striping compelling.
  • Futures/experimental work – some of them are ridicuolous and I apologize in advance
  • pNFS striping flexibility/flexible files (Halevy). Per-file striping and specific parity applications to file layout. OSDv2 layout, presented at IETF 87.
  • pNFS metastripe (Eisler, further WG drafts). Scale-out metadata and parallel operations for NFSv4. Generalizing parallel access concept of NFSv4 for metadata. Built on layout and attribute hints. CohortFS prototyping metastripe on a parallel version of the Ceph file system. NFSv4 missing a per-file redirect, so this has file redirection hints.
  • End-to-end Data Integrity (Lever/IBM). Add end-to-end data integrity primitives (NFSv4.2). Build on new READ_PLUS and WRITE ops. Potentially high value for many applications.
  • pNFS Placement Layouts (CohortFS). Design for algorithmic placement in pNFS layout extension. OSD selection and placement computed by a function returned at GETDEVICEINFO. Client execution of placement codes, complex parity, volumes, etc.
  • Replication Layouts (CohortFS). Client based replication with integrity. Synchronous wide-area replication. Built on Layout.
  • Client Encryption (CohortFS). Relying on named attribute extension only, could use atomicity. Hopefully combined with end-to-end integrity being worked on
  • Cache consistency. POSIX/non-CTO recently proposed (eg, Eshel/IBM). Potentially, more generality. Eg, flexible client cache consistency models in NFSv4. Add value to existing client caching like CacheFS.
  • New participants. You? The future is in the participants…

Windows Server 2012 R2: Which version of the SMB protocol (SMB 1.0, SMB 2.0, SMB 2.1, SMB 3.0 or SMB 3.02) are you using?

$
0
0

Note: This blog post is a Windows Server 2012 R2 update on a previous version focused on Windows Server 2012.

 

1. Introduction

With the release of Windows 8.1 and Windows Server 2012 R2, I am frequently asked about how older versions of Windows will behave when connecting to or from these new versions. Upgrading to a new version of SMB is something that happened a few times over the years and we established a process in the protocol itself by which clients and servers negotiate the highest version that both support.

 

2. Versions

There are several different versions of SMB used by Windows operating systems:

  • CIFS – The ancient version of SMB that was part of Microsoft Windows NT 4.0 in 1996. SMB1 supersedes this version.
  • SMB 1.0 (or SMB1) – The version used in Windows 2000, Windows XP, Windows Server 2003 and Windows Server 2003 R2
  • SMB 2.0 (or SMB2) – The version used in Windows Vista (SP1 or later) and Windows Server 2008
  • SMB 2.1 (or SMB2.1) – The version used in Windows 7 and Windows Server 2008 R2
  • SMB 3.0 (or SMB3) – The version used in Windows 8 and Windows Server 2012
  • SMB 3.02 (or SMB3) – The version used in Windows 8.1 and Windows Server 2012 R2

Windows NT is no longer supported, so CIFS is definitely out. Windows Server 2003 R2 with a current service pack is under Extended Support, so SMB1 is still around for a little while. SMB 2.x in Windows Server 2008 and Windows Server 2008 R2 are under Mainstream Support until 2015. You can find the most current information on the support lifecycle page for Windows Server. The information is subject to the Microsoft Policy Disclaimer and Change Notice.  You can use the support pages to also find support policy information for Windows XP, Windows Vista, Windows 7 and Windows 8.

In Windows 8.1 and Windows Server 2012 R2, we introduced the option to completely disable CIFS/SMB1 support, including the actual removal of the related binaries. While this is not the default configuration, we recommend disabling this older version of the protocol in scenarios where it’s not useful, like Hyper-V over SMB. You can find details about this new option in item 7 of this blog post: What’s new in SMB PowerShell in Windows Server 2012 R2.

 

3. Negotiated Versions

Here’s a table to help you understand what version you will end up using, depending on what Windows version is running as the SMB client and what version of Windows is running as the SMB server:

OSWindows 8.1 
WS 2012 R2
Windows 8 
WS 2012
Windows 7 
WS 2008 R2
Windows Vista 
WS 2008
Previous
versions
Windows 8.1
WS 2012 R2
SMB 3.02SMB 3.0SMB 2.1SMB 2.0SMB 1.0
Windows 8
WS 2012
SMB 3.0SMB 3.0SMB 2.1SMB 2.0SMB 1.0
Windows 7
WS 2008 R2
SMB 2.1SMB 2.1SMB 2.1SMB 2.0SMB 1.0
Windows Vista
WS 2008
SMB 2.0SMB 2.0SMB 2.0SMB 2.0SMB 1.0
Previous
versions
SMB 1.0SMB 1.0SMB 1.0SMB 1.0SMB 1.0

* WS = Windows Server

  

4. Using PowerShell to check the SMB version

In Windows 8 or Windows Server 2012, there is a new PowerShell cmdlet that can easily tell you what version of SMB the client has negotiated with the File Server. You simply access a remote file server (or create a new mapping to it) and use Get-SmbConnection. Here’s an example:

PS C:\> Get-SmbConnection
 

ServerName   ShareName  UserName            Credential          Dialect   NumOpens
----------   ---------  --------            ----------          -------   --------
FileServer1  IPC$       DomainName\UserN... DomainName.Testi... 3.00      0
FileServer1  FileShare  DomainName\UserN... DomainName.Testi... 3.00      14
FileServ2    FS2        DomainName\UserN... DomainName.Testi... 3.02      3 
VNX3         Share1     DomainName\UserN... DomainName.Testi... 3.00      6
Filer2       Library    DomainName\UserN... DomainName.Testi... 3.00      8

DomainCtrl1  netlogon   DomainName\Compu... DomainName.Testi... 2.10      1

In the example above, a server called “FileServer1” was able to negotiate up to version 3.0. FileServ2 can use version 3.02. That means that both the client and the server support the latest version of the SMB protocol. You can also see that another server called “DomainCtrl1” was only able to negotiate up to version 2.1. You can probably guess that it’s a domain controller running Windows Server 2008 R2. Some of the servers on the list are not running Windows, showing the dialect that these non-Windows SMB implementations negotiated with this specific Windows client.

If you just want to find the version of SMB running on your own computer, you can use a loopback share combined with the Get-SmbConnection cmdlet. Here’s an example:

PS C:\> dir \\localhost\c$
 
Directory: \\localhost\c$

 
Mode                LastWriteTime     Length Name

----                -------------     ------ ----
d----         5/19/2012   1:54 AM            PerfLogs
d-r--          6/1/2012  11:58 PM            Program Files
d-r--          6/1/2012  11:58 PM            Program Files (x86)
d-r--         5/24/2012   3:56 PM            Users
d----          6/5/2012   3:00 PM            Windows
 
PS C:\> Get-SmbConnection -ServerName localhost
 
ServerName  ShareName  UserName            Credential          Dialect  NumOpens
----------  ---------  --------            ----------          -------  --------
localhost   c$         DomainName\UserN... DomainName.Testi... 3.02     0

 

You have about 10 seconds after you issue the “dir” command to run the “Get-SmbConnection” cmdlet. The SMB client will tear down the connections if there is no activity between the client and the server. It might help to know that you can use the alias “gsmbc” instead of the full cmdlet name.

 

5. Features and Capabilities

Here’s a very short summary of what changed with each version of SMB:

  • From SMB 1.0 to SMB 2.0 - The first major redesign of SMB
    • Increased file sharing scalability
    • Improved performance
      • Request compounding
      • Asynchronous operations
      • Larger reads/writes
    • More secure and robust
      • Small command set
      • Signing now uses HMAC SHA-256 instead of MD5
      • SMB2 durability
  • From SMB 2.0 to SMB 2.1
    • File leasing improvements
    • Large MTU support
    • BranchCache
  • From SMB 2.1 to SMB 3.0
    • Availability
      • SMB Transparent Failover
      • SMB Witness
      • SMB Multichannel
    • Performance
      • SMB Scale-Out
      • SMB Direct (SMB 3.0 over RDMA)
      • SMB Multichannel
      • Directory Leasing
      • BranchCache V2
    • Backup
      • VSS for Remote File Shares
    • Security
      • SMB Encryption using AES-CCM (Optional)
      • Signing now uses AES-CMAC
    • Management
      • SMB PowerShell
      • Improved Performance Counters
      • Improved Eventing
  • From SMB 3.0 to SMB 3.02
    • Automatic rebalancing of Scale-Out File Server clients
    • Improved performance of SMB Direct (SMB over RDMA)
    • Support for multiple SMB instances on a Scale-Out File Server

You can get additional details on the SMB 2.0 improvements listed above at
http://blogs.technet.com/b/josebda/archive/2008/12/09/smb2-a-complete-redesign-of-the-main-remote-file-protocol-for-windows.aspx

You can get additional details on the SMB 3.0 improvements listed above at
http://blogs.technet.com/b/josebda/archive/2012/05/03/updated-links-on-windows-server-2012-file-server-and-smb-3-0.aspx

You can get additional details on the SMB 3.02 improvements in Windows Server 2012 R2 at
http://technet.microsoft.com/en-us/library/hh831474.aspx

 

6. Recommendation

We strongly encourage you to update to the latest version of SMB, which will give you the most scalability, the best performance, the highest availability and the most secure SMB implementation.

Keep in mind that Windows Server 2012 Hyper-V and Windows Server 2012 R2 Hyper-V only support SMB 3.0 for remote file storage. This is due mainly to the availability features (SMB Transparent Failover, SMB Witness and SMB Multichannel), which did not exist in previous versions of SMB. The additional scalability and performance is also very welcome in this virtualization scenario. The Hyper-V Best Practices Analyzer (BPA) will warn you if an older version is detected.

 

7. Conclusion

We’re excited about SMB3, but we are also always concerned about keeping as much backwards compatibility as possible. Both SMB 3.0 and SMB 3.02 bring several key new capabilities and we encourage you to learn more about them. We hope you will be convinced to start planning your upgrades as early as possible.

 


Note 1: Protocol Documentation

If you consider yourself an SMB geek and you actually want to understand the SMB NEGOTIATE command in greater detail, you can read the [MS-SMB2-Preview] protocol documentation (which covers SMB 2.0, 2.1, 3.0 and 3.02), currently available from http://msdn.microsoft.com/en-us/library/ee941641.aspx. In regards to protocol version negotiation, you should pay attention to the following sections of the document:

  • 1.7: Versioning and Capability Negotiation
  • 2.2.3: SMB2 Negotiate Request
  • 2.2.4: SMB2 Negotiate Response

Section 1.7 includes this nice state diagram describing the inner workings of protocol negotiation:

clip_image001

 

Note 2: Third-party implementations

There are several implementations of the SMB protocol from someone other than Microsoft. If you use one of those implementations of SMB, you should ask whoever is providing the implementation which version of SMB they implement for each version of their product. Here are a few of these implementations of SMB:

Please note that is not a complete list of implementations and the list is bound to become obsolete the minute I post it. Please refer to the specific implementers for up-to-date information on their specific implementations and which version and optional portions of the protocol they offer.

Networking configurations for Hyper-V over SMB in Windows Server 2012 and Windows Server 2012 R2

$
0
0

One of the questions regarding Hyper-V over SMB that I get the most relates to how the network should be configured. Networking is key to several aspects of the scenario, including performance, availability and scalability.

The main challenge is to provide a fault-tolerant and high-performance network for the two clusters typically involved: the Hyper-V cluster (also referred to as the Compute Cluster) and the Scale-out File Server Cluster (also referred to as the Storage Cluster).

Not too long ago, the typical configuration for virtualization deployments would call for up to 6 distinct networks for these two clusters:

  • Client (traffic between the outside and VMs running in the Compute Cluster)
  • Storage (main communications between the Compute and Storage clusters)
  • Cluster (communication between nodes in both clusters, including heartbeat)
  • Migration (used for moving VMs between nodes in the Compute Cluster)
  • Replication (used by Hyper-V replica to send changes to another site)
  • Management (used to configuring and monitoring the systems, typically also including DC and DNS traffic)

These days, it’s common to consolidate these different types of traffic, with the proper fault tolerance and Quality of Service (QoS) guarantees.

There are certainly many different ways to configure the network for your Hyper-V over SMB, but this blog post will focus on two of them:

  • A basic fault-tolerant solution using just two physical network ports per node
  • A high-end solution using RDMA networking for the highest throughput, highest density, lowest latency and low CPU utilization.

Both configurations presented here work with Windows Server 2012 and Windows Server 2012 R2, the two versions of Windows Server that support the Hyper-V over SMB scenario.

 

Configuration 1 – Basic fault-tolerant Hyper-V over SMB configuration with two non-RDMA port

 

The solution below using two network ports for each node of both the Compute Cluster and the Storage Cluster. NIC teaming is the main technology used for fault tolerance and load balancing.

image

Configuration 1: click on diagram to see a larger picture

Notes:

  • A single dual-port network adapter per host can be used. Network failures are usually related to cables and switches, not the NIC itself. It the NIC does fail, failover clustering on the Hyper-V or Storage side would kick in. Two network adapters each with one port is also an option.
  • The 2 VNICs on the Hyper-V host are used to provide additional throughput for the SMB client via SMB Multichannel, since the VNIC does not support RSS (Receive Side Scaling, which helps spread the CPU load of networking activity across multiple cores). Depending on configuration, increasing it up to 4 VNICs per Hyper-V host might be beneficial to increase throughput.
  • You can use additional VNICs that are dedicated for other kinds of traffic like migration, replication, cluster and management. In that case, you can optionally configure SMB Multichannel constraints to limit the SMB client to a specific subset of the VNICs. More details can be found in item 7 of the following article: The basics of SMB Multichannel, a feature of Windows Server 2012 and SMB 3.0
  • If RDMA NICs are used in this configuration, their RDMA capability will not be leveraged, since the physical port capabilities are hidden behind NIC teaming and the virtual switch.
  • Network QoS should be used to tame each individual type of traffic on the Hyper-V host. In this configuration, it’s recommended to implement the network QoS at the virtual switch level. See http://technet.microsoft.com/en-us/library/jj735302.aspx for details (the above configuration matches the second one described in the linked article).

 

Configuration 2 - High-performance fault-tolerant Hyper-V over SMB configuration with two RDMA ports and two non-RDMA ports

 

The solution below requires four network ports for each node of both the Compute Cluster and the Storage Cluster, two of them being RDMA-capable. NIC teaming is the main technology used for fault tolerance and load balancing on the two non-RDMA ports, but SMB Multichannel covers those capabilities for the two RDMA ports.

image

Configuration 2: click on diagram to see a larger picture

Notes:

  • Two dual-port network adapter per host can be used, one RDMA and one non-RDMA.
  • In this configuration, Storage, Migration and Clustering traffic should leverage the RDMA path. The client, replication and management traffic should use the teamed NIC path.
  • In this configuration, if using Windows Server 2012 R2, Hyper-V should be configured to use SMB for Live Migration. This is not the default setting.
  • The SMB client will naturally prefer the RDMA paths, so there is no need to specifically configure that preference via SMB Multichannel constraints.
  • There are three different types of RDMA NICs that can be used: iWARP, RoCE and InifiniBand. Below are links to step-by-step configuration instructions for each one:
  • Network QoS should be used to tame traffic flowing through the virtual switch on the Hyper-V host. If your NIC and switch support Data Center Bridging (DCB) and Priority Flow Control (PFC), there are additional options available as well. See http://technet.microsoft.com/en-us/library/jj735302.aspx for details (the above configuration matches the fourth one described in the linked article).
  • In most environments, RDMA provides enough bandwidth without the need of any traffic shaping. If using Windows Server 2012 R2, SMB Bandwidth Limits can optionally be used to shape the Storage and Live Migration traffic. More details can be found in item 4 of the following article: What’s new in SMB PowerShell in Windows Server 2012 R2. SMB Bandwidth Limits can also be used for configuration 1, but it's more common here.

 

I hope this blog posts helps with the network planning for your Private Cloud deployment. Feel free to ask questions via the comments below.

 

 

Automatic SMB Scale-Out Rebalancing in Windows Server 2012 R2

$
0
0

Introduction

 

This blog post focus on the new SMB Scale-Out Rebalancing introduced in Windows Server 2012 R2. If you haven’t seen it yet, it delivers a new way of balancing file clients accessing a Scale-Out File Server.

In Windows Server 2012, each client would be randomly directed via DNS Round Robin to a node of the cluster and stick with that one for all shares, all traffic going to that Scale-Out File Server. If necessary, some server-side redirection of individual IO requests could happen in order to fulfill the client request.

In Windows Server 2012 R2, a single client might be directed to a different node for each file share. The idea here is that the client will connect to the best node for each individual file share in the Scale-Out File Server Cluster, avoiding any kind of server-side redirection.

Now there are some details about when redirection can happen and when the new behavior will apply. Let’s look into the 3 types of scenarios you might encounter.

 

Hyper-V over SMB with Windows Server 2012 and a SAN back-end (symmetric)

 

When we first introduced the SMB Scale-Out File Server in Windows Server 2012, as mentioned in the introduction, the client would be randomly directed to one and only one node for all shares in that cluster.

If the storage is equally accessible from every node (what we call symmetric cluster storage), then you can do reads and writes from every file server cluster node, even if it’s not the owner node for that Cluster Shared Volume (CSV). We refer to this as Direct IO.

However, metadata operations (like creating a new file, renaming a file or locking byte range on a file) must be done orchestrated cross the cluster and will be executed on a single node called the coordinator node or the owner node. Any other node will simply redirect these metadata operations to the coordinator node.

The diagram below illustrates these behaviors:

 

image

Figure 1: Windows Server 2012 Scale-Out File Server on symmetric storage

 

The most common example of symmetric storage is when the Scale-Out File Server is put in front of a SAN. The common setup is to have every file server node connected to the SAN.

Another common example is when the Scale-Out File Server is using a clustered Storage Spaces solution with a shared SAS JBOD using Simple Spaces (no resiliency).

 

Hyper-V over SMB with Windows Server 2012 and Mirrored Storage Spaces (asymmetric)

 

When using a Mirrored Storage Spaces, the CSV operates in a block level redirected IO mode. This means that every read and write to the volume must be performed through the coordinator node of that CSV.

This configuration, where not every node has the ability to read/write to the storage, is generically called asymmetric storage. In those cases, every data and metadata request must be redirected to the coordinator node.

In Windows Server 2012, the SMB client chooses one of the nodes of the Scale-Out File Server cluster using DNS Round Robin and that may not necessarily be the coordinator node that owns the CSV that contains the file share it wants to access.

In fact, if using multiple file shares in a well-balanced cluster, it’s likely that the node will own some but not all of the CSVs required.

That means some SMB requests (for data or metadata) are handled by the node and some are redirected via the cluster back-end network to the right owner node. This redirection, commonly referred to as “double-hop”, is a very common occurrence in Windows Server 2012 when using the Scale-Out File Server combined with Mirrored Storage Spaces.

It’s important to mention that this cluster-side redirection is something that is implemented by CSV and it can be very efficient, especially if your cluster network uses RDMA-capable interfaces.

The diagram below illustrates these behaviors:

 

image

Figure 2: Windows Server 2012 Scale-Out File Server on asymmetric storage

 

The most common example of asymmetric storage is when the Scale-Out File Server is using a Clustered Storage Spaces solution with a Shared SAS JBOD using Mirrored Spaces.

Another common example is when only a subset of the file server nodes is directly connected to a portion backend storage, be it Storage Spaces or a SAN.

A possible asymmetric setup would be a 4-node cluster where 2 nodes are connected to one SAN and the other 2 nodes are connected to a different SAN.

 

Hyper-V over SMB with Windows Server 2012 R2 and Mirrored Storage Spaces (asymmetric)

 

If you’re following my train of thought here, you probably noticed that the previous configuration has a potential for further optimization and that’s exactly what we did in Windows Server 2012 R2.

In this new release, the SMB client gained the flexibility to connect to different Scale-Out File Server cluster nodes for each independent share that it needs to access.

The SMB server also gained the ability to tell its clients (using the existing Witness protocol) what is the ideal node to access the storage, in case it happens to be asymmetric.

With the combination of these two behavior changes, a Windows Server 2012 R2 SMB client and server are capable to optimize the traffic, so that no redirection is required even for asymmetric configurations.

The diagram below illustrates these behaviors:

 

image

Figure 3: Windows Server 2012 R2 Scale-Out File Server on asymmetric storage

 

Note that the SMB client now always talks to the Scale-Out File Server node that is the coordinator of the CSV where the share is.

Note also that the CSV ownership is shared across nodes in the example. That is not a coincidence. CSV now includes the ability to spread its CSVs across the nodes uniformly.

If you add or remove nodes or CSVs in the Scale-Out File Server cluster, the CSVs will be rebalanced. The SMB clients will then also be rebalanced to follow the CSV owner nodes for their shares.

 

Key configuration requirements for asymmetric storage in Windows Server 2012 R2

 

Because of this new automatic rebalancing, there are key new considerations when designing asymmetric (Mirrored or Parity Storage Spaces) storage when using Windows Server 2012 R2.

First of all, you should have at least as many CSVs as you have file server cluster nodes. For instance, for a 3-node Scale-Out File Server, you should have at least 3 CSVs. Having 6 CSVs is also a valid configuration, which will help with rebalancing when one of the nodes is down for maintenance.

To be clear, if you have a single CSV in such asymmetric configuration in Windows Server 2012 R2 Scale-Out File Server cluster, only one node will be actively accessed by SMB clients.

You should also try, as much as possible, to have your file shares and workloads evenly spread across the multiple CSVs. This way you won’t have some nodes working much harder than others.

 

Forcing per-share redirection for symmetric storage in Windows Server 2012 R2

 

The new per-share redirection does not happen by default in the Scale-Out File Server if the back-end storage is found to be symmetric.

For instance, if every node of your file server is connected to a SAN back-end, you will continue to have the behavior described on Figure 1 (Direct IO from every node plus metadata redirection).

The CSVs will automatically be balanced across file server cluster nodes even in symmetric storage configurations. You can turn that behavior off using the cmdlet below, although I'm hard pressed to find any good reason to do it.

(Get-Cluster). CSVBalancer = 0

However, when using symmetric storage, the SMB clients will continue to each connect a single file server cluster node for all shares. We opted for this behavior by default because Direct IO tends to be efficient in these configurations and the amount of metadata redirection should be fairly small.

You can override this setting and make the symmetric cluster use the same rebalancing behavior as an asymmetric cluster by using the following PowerShell cmdlet:

Set-ItemProperty HKLM:\System\CurrentControlSet\Services\LanmanServer\Parameters -Name AsymmetryMode -Type DWord -Value 2 -Force

You must apply the setting above to every file server cluster node. The new behavior won’t apply to existing client sessions.

If you switch to this configuration, you must apply the same planning rules outlined previously (at least one CSV per file server node, ideally two).

 

Conclusion

 

I hope this clarifies the behavior changes introduced with SMB Scale-Out Automatic Rebalancing in Windows Server 2012 R2.

While most of it is designed to just work, I do get some questions about it from those interested in understanding what happens behind the scenes.

Let me know if you find those useful or if you have any additional questions.

Viewing all 160 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>