VMFS datastore showing greyed out, inactive and unmounted

February 21, 2013, 9:17 am

≫ Next: PowerShell: Ping test script (sequential) with CSV output

≪ Previous: Janet CSIRT Denial Of Service Briefing

I’ve been having “fun” recently with VMware storage issues following a temporary loss of power to one of my two EMC Clariion CX4-480 SANs. The power outage was expected, and the SAN was shut down before the electricity was lost. However, ever since I’ve been periodically getting hosts greying out in the vSphere client. They’d eventually come back to life again, and the VMs running on them didn’t seem affected – though they did grey out in vSphere client too and were thus not manageable, but it was worrying – something wasn’t happy.

One host (they’re all running ESXi 5.0 update 2, build 914586) in particular had had several of these issues, more so than the others. So this morning, when it was looking normal, I decided to try putting it into maintenance mode (which worked) and then rebooted it once all the VMs had been moved off it.

When it came back up again, I noticed that one of the VMFS datastores was greyed out. Looking in the host’s Configuration – Hardware – Storage tab, that datastore was listed in the Identification column as <Name> (inactive) (unmounted), and the Capacity, Free and Type columns were all showing N/A:

I found the following entries in the /var/log/vmkernel.log file by looking for the naa reference for the problem LUN/datastore:

~ # cat /var/log/vmkernel.log | grep naa.600601607c7028006891d164db7be011
2013-02-21T08:31:55.213Z cpu26:4122)ScsiDeviceIO: 2324: Cmd(0x4124425654c0) 0x16, CmdSN 0x99f1 from world 0 to dev "naa.600601607c7028006891d164db7be011" fled H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2013-02-21T08:31:55.213Z cpu4:5147)LVM: 11918: Failed to open device naa.600601607c7028006891d164db7be011:1

Trying to do things on these VMs that involved much disk activity caused the VM to lock up and stop responding, even to pings. The host running the VM would then grey out. This eventually happened to about half the VMs on the datastore.

Trying to do an ls (directory listing) on the datastore via SSH to any host also failed after a few minutes.

According to VMware KB 289902 the H:0×5 is the Host status (Initiator), and means SG_ERR_DID_ABORT, aka Told to abort for some other reason, which is less than helpful.

Then I tried searching again and found this blog post, where it is mentioned that one of the steps early in the resolution was to fail over the SAN storage controller. Trespassing a LUN on the Clariion is pretty quick and easy to do. The problem LUN was owned by SPB, its default SP, so I trespassed it, which moved it to SPA. A few seconds later the VMs stored on the datastore started springing back to life,and the hosts recovered too, so that seems to have fixed it. I didn’t need to do any of the other stuff listed in Shabir Yusuf’s blog.

The host that had been rebooted was still showing the LUN as inactive and unmounted, so I right-clicked it and chose “Mount”, and it seems fine now.

↧

PowerShell: Ping test script (sequential) with CSV output

February 27, 2013, 11:38 am

≫ Next: PowerShell: Delete a Task Scheduler task

≪ Previous: VMFS datastore showing greyed out, inactive and unmounted

This is a script that sequentially tests the specified hosts and writes the output to a CSV file. Unlike the parallel ping script I posted a few days ago, this one writes the output of all the tests the the same file, which whilst not as efficient whilst doing the tests, makes producing graphs in Excel from the results much easier.

Here’s the script:

$Computers = "192.168.0.1","rcmtech.co.uk","bbc.co.uk","virginmedia.com"
$SleepTime = 5
$CSVFile = "$env:UserProfile\Desktop\PingTest.csv"
while($true){
    $Now = Get-Date
    $ResultObject = New-Object PSObject
    $ResultObject | Add-Member NoteProperty Time $Now
    $i = 1
    foreach($Computer in $Computers){
        Write-Progress -Activity "Ping Test" -Status "Pinging $Computer" -PercentComplete ($i/$Computers.Count*100)
        $TestResult = Test-Connection $Computer -Count 1 -ErrorAction SilentlyContinue
        if($TestResult.ResponseTime -eq $null){
            $ResponseTime = -1
        } else {
            $ResponseTime = $TestResult.ResponseTime
        }
        $ResultObject | Add-Member NoteProperty $Computer $ResponseTime
        $i++
    }
    Write-Progress -Activity "Ping Test" -Status "Write results to file" -PercentComplete 100
    $ResultObject | Format-Table -AutoSize
    Export-Csv -InputObject $ResultObject -Path $CSVFile -Append -NoTypeInformation
    for($DelayTime=0; $DelayTime -lt $SleepTime; $DelayTime++){
        Write-Progress -Activity "Ping Test" -Status "Sleeping" -SecondsRemaining ($SleepTime-$DelayTime)
        Start-Sleep 1
    }
}

↧

PowerShell: Delete a Task Scheduler task

March 1, 2013, 2:08 am

≫ Next: Windows Server 2008 R2: Windows Search Service

≪ Previous: PowerShell: Ping test script (sequential) with CSV output

This code allows you to delete a task from any folder within the Windows Task Scheduler, it’s therefore much more flexible (if more long winded) then using the Unregister-ScheduledJob cmdlet which only allows you to access jobs in the folder: Task Scheduler Library\Microsoft\Windows\PowerShell\ScheduledJobs

# create Task Scheduler COM object
$TS = New-Object -ComObject Schedule.Service
# connect to local task sceduler
$TS.Connect($env:COMPUTERNAME)
# get tasks folder (in this case, the root of Task Scheduler Library)
$TaskFolder = $TS.GetFolder("\")
# get tasks in folder
$Tasks = $TaskFolder.GetTasks(1)
# define name of task to delete
$TaskToDelete = "MyTask"
# step through all tasks in the folder
foreach($Task in $Tasks){
    if($Task.Name -eq $TaskToDelete){
        Write-Host ("Task "+$Task.Name+" will be removed")
        $TaskFolder.DeleteTask($Task.Name,0)
    }
}

↧

Windows Server 2008 R2: Windows Search Service

March 7, 2013, 3:40 am

≫ Next: NetBackup bpfis and bpbkar32 memory issues with VMware backups

≪ Previous: PowerShell: Delete a Task Scheduler task

Just added this to a server with the intention of indexing some NetBackup log files to make it easier to find things in them. Note that Windows Search Service is new with Server 2008 R2, and is not the same as the old Indexing Service from previous versions on Windows up to Server 2008. You also can’t have both the new Windows Search Service and the old Indexing Service on the same machine. Windows Search Service is a Role Service of the File Server role, so you need to add that to get Windows Search Service.

After you add the Windows Search Service, you need to go to Control Panel and use the Indexing Options to configure which folders you want it to index. Except Indexing Options wasn’t showing in Control Panel. Logging off and back on again makes it show.

I removed all the default indexed locations and just added the logs folder that I wanted indexed. In Indexing Options – Advanced – File Types I changed the option for .log files to “Index Properties and File Contents”.

Finally, I created a new library in Windows Explorer for the logs folder to allow easy access.

I’m not sure how big the index will get, there are a lot of log files. I’ll monitor this and see how it goes, but I’m hoping it’ll prove useful.

↧

NetBackup bpfis and bpbkar32 memory issues with VMware backups

March 12, 2013, 1:34 am

≫ Next: PowerShell: QuickAddress Pro data expiry checker

≪ Previous: Windows Server 2008 R2: Windows Search Service

I noticed events mentioning low memory conditions on my NetBackup 7.1 media servers, during the times when they’d be busy doing evening backups. These servers have 32GB of RAM, so I was surprised to see them saying out of memory.

Log Name:      System
 Source:        Microsoft-Windows-Resource-Exhaustion-Detector
 Date:          06/03/2013 19:52:44
 Event ID:      2004
 Task Category: Resource Exhaustion Diagnosis Events
 Level:         Warning
 Keywords:      Events related to exhaustion of system commit limit (virtual memory).
 User:          SYSTEM
 Computer:      NBU-MEDIA01.rcmtech.co.uk
 Description:
 Windows successfully diagnosed a low virtual memory condition.
 The following programs consumed the most virtual memory:
 bpfis.exe (5316) consumed 113549312 bytes,
 bpbkar32.exe (12964) consumed 113000448 bytes,
 and bpfis.exe (13224) consumed 110944256 bytes.

I found article TECH161502 in the Symantec knowledge base, which refers to an issue with “larger” VMware environments and releasing information from memory.

I’ve added the registry DWORD value mentioned (HKEY_LOCAL_MACHINE\SOFTWARE\Symantec\SymVmTools\SymVmToolsDisableRefresh = 1) to my Master and two Media servers and have not seen bpfis.exe mentioned in the event log since.

I have had one instance of event 2004 logged since I made the registry change, on the evening of the day that I made the registry changes. However, this time it only mentioned bpbkar32.exe. Before I got rid of the bpfis.exe memory issue, and still afterwards, I’ve also been getting quite a few (~ 20/day) errors saying that bpbkar32 terminated unexpectedly, again during the peak period of my evening backups. I’ve found TECH186680 which mentions this issue, but nearly a year on and there’s still no fix.

↧

PowerShell: QuickAddress Pro data expiry checker

March 13, 2013, 6:18 am

≫ Next: PowerShell: Sleep with progress bar

≪ Previous: NetBackup bpfis and bpbkar32 memory issues with VMware backups

QAS Pro (formerly QuickAddress Pro) is a database of addresses that allows you to do postcode lookups etc. Its database is updated every few months, and the databases have expiry dates built in – if you don’t keep the database reasonably up to date then it stops working.

It comes with a .jsp file that allows you to query various things about the product via the web server that QAS is running on. Go to

http://<your-qas-server>:<qasport>/proweb/test.jsp

to view the page (port is 8080 on my server, not sure if that’s the default, was ages ago when this was set up). One of the things listed on that page is the data expiry, shown as days remaining.

Below is a powershell script that pulls the test page, gets the days remaining via a regular expression (regex) and displays just that value to screen. I’ve also used a switch statement to break down the result into a traffic light representation, you could do other stuff in this too like send emails, update a separate monitoring system etc.

# variables for the script 
$YellowLevel = 14 
$RedLevel = 7 
$SleepHours = 1
# change window title 
$Host.UI.RawUI.WindowTitle = "QAS DB Expiry Checker"
# loop indefinitely 
while($true){
     #get web page
     $Page = (Invoke-WebRequest "http://qaswebserver.rcmtech.co.uk:8080/proweb/test.jsp").Content
     # look for text on web page using regular expression
     if($Page -match "[0-9]+ days"){
         # get matching text string from full web page text
         $FullText = (Select-String -InputObject $Page -Pattern "[0-9]+ days").Matches.Value
         Write-Host (get-date),"QAS remaining: $FullText - " -NoNewline
         # get just the number of days
         $DaysString = (Select-String -InputObject $FullText -Pattern "[0-9]+").Matches.Value
         # convert number of days to integer to allow numeric matching operations, e.g. "greater than"
         $DaysInteger = [convert]::ToInt16($DaysString)
         # interrogate the number of days and set status based on what its value is
         switch($DaysInteger){
             {$_ -gt $YellowLevel} {
                 Write-Host "Green" -ForegroundColor Green
             }
             {($_ -le $YellowLevel) -and ($_ -gt $RedLevel)} {
                 Write-Host "Yellow" -ForegroundColor Yellow
             }
             {$_ -le $RedLevel} {
                 Write-Host "Red" -ForegroundColor Red
             }
         }
     } else {
         # page did not contain expected text
         Write-Host "Error with page" -ForegroundColor Yellow
     }
     # wait for specified time
     Start-Sleep -Seconds ($SleepHours * 60 * 60)
 }

The script shows how to change the title of the PowerShell window.
It demonstrates how to use Invoke-WebRequest to get the contents of a web page into a variable.
It also demonstrates how to extract text from a string using a regular expression.
You can also see how to use integers with “greater than” and “less than or equal to” with the switch statement.

↧

PowerShell: Sleep with progress bar

March 13, 2013, 9:08 am

≫ Next: PowerShell: Scan subnet for dead addresses

≪ Previous: PowerShell: QuickAddress Pro data expiry checker

Wrote this function to provide a common way to add sleep and give progress updates.

Set the sleep time by creating a hash table (aka dictionary) specifying the hours/minutes/seconds you want to pause your script for. You don’t have to specify all of hours, minutes or seconds, just one or more. Call the SleepProgress function with your hash table as the parameter.

function SleepProgress([hashtable]$SleepHash){
     [int]$SleepSeconds = 0
     foreach($Key in $SleepHash.Keys){
         switch($Key){
             "Seconds" {
                 $SleepSeconds = $SleepSeconds + $SleepHash.Get_Item($Key)
             }
             "Minutes" {
                 $SleepSeconds = $SleepSeconds + ($SleepHash.Get_Item($Key) * 60)
             }
             "Hours" {
                 $SleepSeconds = $SleepSeconds + ($SleepHash.Get_Item($Key) * 60 * 60)
             }
         }
     }
     for($Count=0;$Count -lt $SleepSeconds;$Count++){
         $SleepSecondsString = [convert]::ToString($SleepSeconds)
         Write-Progress -Activity "Please wait for $SleepSecondsString seconds" -Status "Sleeping" -PercentComplete ($Count/$SleepSeconds*100)
         Start-Sleep -Seconds 1
     }
     Write-Progress -Activity "Please wait for $SleepSecondsString seconds" -Completed
 }
$SleepTime = @{"Seconds" = 30;"Minutes" = 1}
SleepProgress $SleepTime

This script demonstrates the use of switch, write-progress and hash tables.

↧

PowerShell: Scan subnet for dead addresses

March 15, 2013, 3:17 am

≫ Next: Bye-bye SAN, Hello Tintri

≪ Previous: PowerShell: Sleep with progress bar

Quick script to scan a range of addresses on an IPv4 subnet and report ones that don’t ping, additionally doing a reverse DNS query on those IP addresses to try and get their names.

I’m doing two pings, as I found that when only doing one I got a lot of false positives. The reverse DNS query slows things down quite a bit so comment that out if you don’t need it.

$Subnet = "164.11.249"
$MinAddress = 47
$MaxAddress = 254
# step through each address from MinAddress to MaxAddress
for($Address=$MinAddress;$Address -le $MaxAddress;$Address++){
    # make a text string for the current IP address to be tested
    $TestAddress = $Subnet+"."+[convert]::ToString($Address)
    # do the ping(s), don't display red text if the ping fails (erroraction)
    $Result = Test-Connection -ComputerName $TestAddress -Count 2 -ErrorAction SilentlyContinue
    if($Result -eq $null){
        Write-Host "No reply from $TestAddress" -NoNewline
        # have to use try/catch as the GetHostByAddress errors in a nasty way if it doesn't find anything
        try{
            # do the reverse DNS query
            $HostName = [System.Net.Dns]::GetHostByAddress($TestAddress).HostName
            if($HostName -ne $null){
                # found something in DNS, display it
                Write-Host " ($HostName)"
            }
        } catch {
            # didn't find anything (GetHostByAddress generated an error), just do a CRLF
            write-host ""
        }
    }
}

This script demonstrates the use of reverse DNS queries, a try catch statement, and converting integers to strings.

↧

Bye-bye SAN, Hello Tintri

March 18, 2013, 3:00 pm

≫ Next: Enterprise Storage Part 1 – What are IOPS, and other basics

≪ Previous: PowerShell: Scan subnet for dead addresses

I first came across Tintri over a year ago, had a look at their product on their website and thought “this sounds rather good”. I’ve chatted to them at IP Expo for the last two years, and recently had a WebEx with a few of their guys and a handful of my colleagues, where they went into some detail about what the product does, plus a quick demo.

If you’re currently running VMware VMs from VMFS volumes presented from something like an EMC Clariion SAN then you really want to look very seriously at the Tintri T540.

What is it? 13.5TB of storage in a 3U box. Fast storage, 50k-75k IOPS. Designed for VMware (Hyper-V support coming soon).

How do you get that amount of performance from 3U? You combine SSD and HDD. The T540 has eight 300GB SSDs in RAID 6 (striped with two parity disks) and eight 3TB HDDs, also in RAID 6. Nothing new there, except that Tintri have written their own optimised filesystem called VMstore featuring inline dedupe and some clever MLC flash optimisations. Without this, the Flash wears out quite fast due to the high random writes caused by your VMs hammering away all the time. Even if your VMs are writing mostly sequential data, when you combine many sequential writes together the component disks in the array see a random workload. SSD loves random workloads.

The data within the Tintri is structured such that all writes go to SSD, the data is then migrated to HDD where appropriate in 8k blocks – this is the size they found worked best during testing, and conveniently works nicely with most enterprise apps, including SQL Server and Exchange. Tintri claim to be able to service around 99% of IO from SSD. You can also pin entire VMs or just individual .vmdk files to sit only on SSD, should you want to.

Let’s contrast this to how a Clariion works. The Clariion was designed to be able to service defined, fairly consistent workloads, e.g. a single database on a set of disks. Sure you can make more than one LUN from a set of disks, but you need to understand the workload of each LUN such that one doesn’t negatively impact another. Try doing that for hundreds of VMs all on the same pool/RAID Group… On the Clariion, all writes smaller than the write-aside value (default 1MB) go via the storage processor RAM cache. On a CX4-480 this is about 4GB. There is (usually) a tiny amount of read-ahead cache allocated too, but it’s small because it hardly ever gets used. Back to the write cache. If this fills above the high watermark (default 80%) the SP initiates flushing to disk until the cache falls below the low watermark (default 60%). If you perform a read and the data you want is still held in the write cache, the read will be serviced from the cache, otherwise it’ll come from disk. In reality this means basically all writes come from disk of one form or another. Think about how much provisioned storage you have in your Clariion (it’ll be TB) vs a tiny 4GB write cache.

The Clariion has a couple of data tiering features that you might think would help, and they might, but possibly not as well as you might be led to believe, especially if your workload is servicing VMs. These are FAST and FAST Cache:

FAST (Fully Automated Storage Tiering) moves data around between different types of disk within a Pool, so you could have SSD, FC HDD and SATA HDD. It does not work on RAID Group LUNs as you can only have one type of disk within a RAID Group. The tiering chunk size is 1GB – which is nowhere near granular enough for the jumbled mass of data that you’ll have if you’ve been thin-provisioning your vmdk files. Writes will go to whichever tier the 1GB chunk that holds the block you’re addressing is currently sat on – so possibly SATA.
FAST Cache is a misnomer, it’s not a cache at all. It’s enabled per LUN, and allows up to 2TB of SSD to hold promoted 64kB blocks of data. If you enable FAST Cache for some LUNs, the FAST Cache algorithm monitors the blocks on those LUNs and gradually promotes the most active blocks to SSD. How busy a block needs to be to determine if it’ll be promoted is based on it’s relative “busy-ness” compared to all other blocks on FAST Cache-enabled LUNs across the entire SAN. Writes to otherwise untouched blocks do not go direct to SSD, and writes to relatively quiet blocks do not go direct to SSD. Only writes to already busy blocks that have been promoted will go direct to SSD.

I’m not as familar with other SAN vendors tech as I am with the stuff from EMC but a lot of the above will be similar, if not the same. Check out the granularity of the tiering mechanism, and see if the SSD “cache” is actually a cache.

The Tintri T540 is presented to your ESXi hosts as a single large NFS datastore, via dual controllers (active/standby config) each with two 1/10Gbps Ethernet interfaces (also in active/standby config). Due to the use of NFS and thus Ethernet, you could potentially not only ditch your SAN, but also your FibreChannel fabric(s) too. (Oh, and your storage team… oops)

There’s no more creating LUNs then adding them as datastores, no upgrading VMFS versions, managing SAN RAID Groups or Storage Pools. The Tintri T540 also understands what the files in the NFS datastore are – it knows what makes up a VM and allows you to see how busy a VM is in a variety of different ways, and apply Quality of Service to it.

As you can probably tell, moving your VMs to a T540 will massively reduce complexity and management overhead. Yes – you can adjust the placement of VMs and set QoS, but unless you have some specific requirements this box is “set it and forget it”. Because the T540 integrates with vCenter and understands the data it’s holding, it’s not “just another” SSD+HDD box. It has a load of performance overview and reporting features to make the VM admin’s life easier. Because of the massive amount of performance it can soak up and throw out think how nice it’d be to run SQL Server or Exchange from it.

The T540 makes provisioning a new VM from the vSphere Client super-fast. We’re talking seconds. This is achieved via supporting VAAI. It also has built-in cloning and snapshot capabilities, the latter can be scheduled. Think how handy that speed could be for devs who need a constant supply of new servers or who want to snapshot a VM before they roll out an update.

Check out the features for yourself.

At the head of Tintri the company are people from VMware, Sun, and Data Domain. In addition to those companies the engineering team has people from Citrix, NetApp, Google and Brocade. They know their stuff, and know the storage issues that face anyone who’s taken advantage of server virtualisation whilst having to use legacy SAN/NAS technology.

Clearly, you need to have some idea of what your environment is doing to know if your storage workload would be suitable, but I reckon it would be perfect for a lot of people who’ve had no choice apart from traditional SAN or NAS until now. And if you have a VM-based VDI solution then Tintri could solve all your storage provisioning and boot storm issues.

Oh, and they’re winning awards all over the place. I want one. No, I want two.

↧

Enterprise Storage Part 1 – What are IOPS, and other basics

March 18, 2013, 3:17 pm

≫ Next: vSphere: Convert RDM to VMDK and vice-versa on Windows VM with no downtime

≪ Previous: Bye-bye SAN, Hello Tintri

If you’re specifying storage for enterprise class applications, or more recently, looking at specs of Solid State Disks – even ones for your PC, you’ll see a lot of mentions of IOPS. What exactly are IOPS?

IOPS stands for Input/Output operations Per Second, or IOs Per Second, and is a way of measuring the amount of data you can get into or out of a disk or storage system.

So how big is an IO? The simple answer is that it depends. By IOs we really mean “IO instruction”, ignoring the amount of data. Thus you can get different sized IOs, as on the whole an IO instruction with no associated data is not going to be much use when you’re trying to get data into or out of a storage device.

The chunks of data dealt with by individual IOs tend to be relatively small, ranging between about 512 bytes to 64K bytes. The performance figures for disks will, if you look hard enough, give you detailed figures for the IOPS for a range of different sizes. Consumer SSDs tend to at least mention 4KB IOs.

In the case of individual disks, in physical terms it will clearly take longer to write a larger amount of data to a disk then to write a smaller amount. Think of a spinning disk, if you write 512 bytes to the magnetic surface the disk platter will only have to rotate a small amount between where it starts to write and when it has finished, whereas to write 64KB will take somewhat longer (roughly 64,000 / 512 = 125 times longer). This is why enterprise-grade disks tend to spin faster than consumer disks. A standard consumer disk will spin at 7200rpm whereas an enterprise SAS (Serial Attached SCSI) or FC (Fibre Channel) will spin at 15,000rpm.

But that’s not the only factor that determines how many IOs a device can handle. In the case of spinning magnetic disks, you also have to take into account the “seek time”. This is how long it takes for the disk controller to move the read/write head to the correct place on the surface of the spinning disk, ready to start reading or writing data. For consumer disks this tends to be around 11ms, for decent enterprise grade disks it’ll be more like 4ms. 4ms might not sound like a lot, but it really is a very long time, especially if you are needing to read and write hundreds of times a second at random places over the surface of the disk. This seek time delay is what solid state drives effectively remove, and is one of the main reasons (when compared to magnetic disks) why their IOPS figures tend to be so high, especially for random read operations.

There are a few other factors that can affect IOPS figures for a particular disk, and one of the more interesting ones is the number of IOs that a disk can queue up. This is called Native Command Queuing (NCQ) or Tagged Command Queuing (TCQ). Enterprise disks can usually queue 32 instructions, whereas historically consumer grade disks tended not to queue more than about 2 (this changed with the introduction of the SATA interface). The reason this is important is because if the disk knows what it’s going to be asked to do in the future, it gives it the possibility of re-jigging the order of executing those instructions to take advantage of the physical characteristics of the storage medium. As a very basic example, imagine instructions as follows:

Read 4k from position 0
Write 8K to position 10000
Read 16k from position 2
Read 4k from position 10050

Executed in order, the heads on the disk will be moving rapidly from position 0 to position 10000 to position 2 to position 10050. If the device was able to see the entire queue it might instead choose to execute in the following order:

Read 4k from position 0
Read 16K from position 2
Write 8K to position 10000
Read 4K from position 10050

This has turned out to be a much longer article than I originally envisaged, so I’m going to leave it there. At some point I’ll write part 2 and cover things like RAID levels and cache.

↧

vSphere: Convert RDM to VMDK and vice-versa on Windows VM with no downtime

April 23, 2013, 5:34 am

≫ Next: Script comparison: CMD, VBScript, Powershell

≪ Previous: Enterprise Storage Part 1 – What are IOPS, and other basics

This method will work for non-OS disks (so not your C drive), and will not required the server or application to go down during the process. You need a version of Windows that supports software RAID, which includes all versions of Windows Server and some Client versions too.

You had a good reason to use an RDM (or maybe you didn’t!) but have now changed your mind, or your needs have changed, or your storage infrastructure has changed (perhaps you’re replacing your old Clariion with a Tintri).

Or, maybe you provisioned a machine with a VMDK disk but now it’s gone live or the system performance needs have ramped up and you need to switch it to some dedicated spindles.

We’re going to be using Windows software RAID 1 to copy the data over, this is a block-level process so will transfer all the data and permissions across. It’s mirroring the entire NTFS filesystem, and it does it online. Clearly you’re going to take a bit of a hit in performance whilst the mirror is synchronising, but we’ll be breaking it again as soon as it’s finished. Your data disk will be converted to and remain as a Dynamic disk if it wasn’t already, but this effectively makes no difference (as long as you don’t want to access it from DOS or Linux).

So here we go, with screenshots (wow!). This is an RDM to VMDK conversion, but it’s the same process to go from VMDK to RDM.

Here’s the current VM config:

Hard disk 2 is the data disk, and is the one we’ll be switching to a VMDK.
Here’s the disk config in Windows:
Make a note of the size of the disk that you’re going to transfer. As with all RAID 1, the disk you’re going to mirror onto must be the same size (or larger) than the existing disk. My disk is 10Gb, so I’m going to add a new 10GB VMDK disk to the VM:
Here’s the new disk in Windows Disk Management:
First you need to bring the disk online. Right-click the disk (where it says Disk 2, Unknown, 10.00GB, Offline) and choose Online.
Now you need to initialise the disk, right-click again and choose Initialise:
Now that we have our new disk ready for use, we can add it as a mirror to the original (RDM) disk. Right-click the RDM disk and choose Add Mirror…:
Select the new unallocated disk:

Click Add Mirror.
If your original disk is currently a Basic disk you’ll be warned that the disk will be converted to a dynamic disk:

Click Yes.
The original disk and the new unallocated disk will be converted to dynamic disks:

And very shortly afterwards the mirroring process will commence:

How long this takes depends on the speed of your disks. In my experience there is very little CPU load generated by this process.
Once the resynching process has completed both disks will show as Healthy:
Now it’s time to remove the mirror from the original disk. Right-click it and choose Remove Mirror…:

The Remove Mirror dialogue box will open, it selects the disk that you right-clicked but just confirm that it’s the right one:

Click Remove Mirror, then click Yes.
Now you’ll be running from the new VMDK disk, the old RDM disk will be showing as unallocated:
Before removing the disk from the VM I like to right-click it and put it offline:
Now edit the settings of the VM and remove the RDM disk:

Choose the “and delete files from disk” option as this will delete the .rdmp file.
Finally, here is the view from Windows Disk Management showing the VM that’s now only seeing the (new) VMDK disk:

↧

Script comparison: CMD, VBScript, Powershell

May 15, 2013, 1:29 am

≫ Next: VMM 2012 SP1: Create Virtual Switch fails with Unknown error 0×80041008

≪ Previous: vSphere: Convert RDM to VMDK and vice-versa on Windows VM with no downtime

This was going to be a short scripting exercise for a new member of staff, but has been ditched due to time constraints. I’d written some sample scripts to see how long it’d take, so thought I’d post them as interesting comparison of the three languages.

They’re perhaps more a reflection of my abilities than being textbook examples of the three languages. I know there are better ways that bits of them could be done, but these were knocked up by me in as little time as possible so needed to be functional rather than perfect.

The brief was to write a short script, around 10-20 lines, that would move files from one folder to another if they had a certain text string in the filename. The folders were both located in the current user’s Documents folder. Source folder was called TestFiles, destination folder for matching files was called NewFiles. The text string to look for in the filenames was “robin” (minus the quotes). I wanted the script to be easily changed to use different folders or to look for a different text string. I also wanted the script to write some basic activity information to screen.

CMD (Windows Command processor/batch file)

@echo off
set Folder=%userprofile%\Documents\TestFiles
set NewFolder=%userprofile%\Documents\NewFiles
set FindText=robin
for /f "delims=" %%i in ('dir /b %Folder%') do (
   echo %%i
   call :FindAndMove "%Folder%\%%i" %FindText%
)
:FindAndMove
echo %1 | find /i "%2" >nul:
if %errorlevel%==0 (
   move %1 %NewFolder%\ >nul:
   echo Moved to new location
)
goto :eof
echo Done.

VBScript

Option Explicit
Dim oFSO, oFolder, oShell, oFile
Dim sFolderPath, sFindText, sNewFolder
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oShell = CreateObject("WScript.Shell")
sFolderPath = "%userprofile%\Documents\TestFiles"
sNewFolder = "%userprofile%\Documents\NewFiles"
sFindText = "robin"
sFolderPath = oShell.ExpandEnvironmentStrings(sFolderPath)
sNewFolder = oShell.ExpandEnvironmentStrings(sNewFolder)
Set oFolder = oFSO.GetFolder(sFolderPath).Files
For Each oFile In oFolder
   WScript.Echo oFile.Name
   If InStr(oFile.Name,sFindText) Then
      oFSO.MoveFile oFile.Path,sNewFolder&"\"
      WScript.Echo "Moved to new location"
   End If
Next
WScript.Echo "Done."

PowerShell

$Folder = $env:USERPROFILE+"\Documents\TestFiles"
$NewFolder = $env:USERPROFILE+"\Documents\NewFiles"
$FindText = "robin"
$FolderObj = Get-ChildItem -Path $Folder
foreach($File in $FolderObj){
   Write-Host $File.Name
   if($File.FullName -match $FindText){
      Move-Item -Path $File.FullName -Destination $NewFolder
      Write-Host "Moved to new location"
   }
}
Write-Host "Done."

↧

VMM 2012 SP1: Create Virtual Switch fails with Unknown error 0×80041008

May 15, 2013, 7:47 am

≫ Next: PowerShell: Quick way to tell if a particular hotfix is installed – local and remote

≪ Previous: Script comparison: CMD, VBScript, Powershell

This seems to be because you’re trying to create a Virtual Switch that uses teamed physical NICs that are already part of a Windows Server 2012 Team. This might be because you deleted on old Virtual Switch via Hyper-V Manager’s Virtual Switch Manager. Doing this doesn’t remove any Windows native NIC team that the switch was connected to.

Run LbfoAdmin (or click Server Manager, Local Server, and click the text saying “Enabled” to the right of “NIC Teaming”). Then investigate whether the team showing is using the NICs you were trying to allocate to the new Virtual Switch in VMM 2012 SP1. If they’re in use, you will not be able to use them and will get the following error when you try and add the Virtual Switch:

An internal error has occurred trying to contact the hv01.rcmtech.co.uk server:

WinRM: URL: [http://hv01.rcmtech.co.uk:5985], Verb: [GET], Resource: [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/scvmm/ErrorInfo?ID=1003]

Unknown error (0x80041008)

Recommended Action
Check that WS-Management service is installed and running on server test-hv01.test.local. For more information use the command "winrm helpmsg hresult". If hv01.rcmtech.co.uk is a host/library/update server or a PXE server role then ensure that VMM agent is installed and running.

There are likely no problems with WS-Management (test with the command winrm qc on the host).

The solution is therefore to remove the NICs from the team (or remove the team, and thus free the NICs).

↧

PowerShell: Quick way to tell if a particular hotfix is installed – local and remote

May 16, 2013, 3:28 am

≫ Next: Productivity: People are like hard disks

≪ Previous: VMM 2012 SP1: Create Virtual Switch fails with Unknown error 0×80041008

A colleague just asked me if there was a quick and easy way to find out if a hotfix was installed on a Windows machine. The answer is “yes, use PowerShell”.

Get-HotFix

will list all the hotfixes installed on the computer. You can then filter this by using Where-Object to Match part of the hotfix ID:

Get-HotFix | Where-Object -Property HotfixID -Match "2823180"

or use EQ to find only fully matching hotfix IDs:

Get-HotFix | Where-Object -Property HotfixID -EQ "KB2823180"

or if you want to shorten the command line:

Get-HotFix | where HotfixID -match "2823180"

You can obviously do this query remotely too, using Enter-PSSession:

Enter-PSSession -Computername rcmhv01 -Credential rcmtech-admin
Get-HotFix | Where-Object -Property HotfixID -EQ "KB2823180"
Exit-PSSession

Note that the Credential will assume the current Active Directory domain, but you can prefix the username with alternativedomain\ if you want to use a different one. The credential prompt will pop up an authentication dialogue to ask for the password. If you’re logged on with an account that has sufficient permissions (e.g. administrator) on the remote machine then you can omit the Credential parameter completely. The PowerShell prompt will change after the Enter-PSSession command to indicate that you’re now executing commands on a different machine.

Or do it on one line:

Invoke-Command -ComputerName rcmhv01 -Credential rcmtech-admin -ScriptBlock {Get-HotFix | Where-Object -Property HotfixID -EQ "KB2823180"}

If you’re having issues running commands remotely then check the steps here.

↧

Productivity: People are like hard disks

May 24, 2013, 5:26 am

≫ Next: Get VMware VM creation date (kind of)

≪ Previous: PowerShell: Quick way to tell if a particular hotfix is installed – local and remote

I just read a fairly common sense article from HP’s latest Technology at Work newsletter.

It prompted me into writing down some thoughts on human productivity that I’ve had for a while that are nicely relevant from an IT Pro’s perspective: that humans are like (mechanical) hard disks.

The gist of HP’s article is that productivity decreases as multitasking increases: try to switch from one task to another too frequently and you become increasingly “busy” but your productivity drops right off.

Now think about how fast you can get data into or out of a hard disk if you’re streaming to/from a single large file – it’s actually not so bad. But make that disk start trying to read and write from multiple files (or a single fragmented file) and the seek time kills the data throughput. The time the read/write heads spend being moved from one place to another becomes so significant that the disk spends a higher proportion of its time moving the heads around than it does actually reading/writing data.

To help improve the efficiency of a disk it is accessed via a controller that buffers the requests, and can even re-order and prioritise them. The human equivalent of this is (should be!) your manager.

Continuing this analogy, we could start to consider the different ways that teams of people operate as RAID levels, but perhaps that’s going a little too far!

↧

Get VMware VM creation date (kind of)

May 29, 2013, 8:39 am

≫ Next: SCVMM 2012 SP1: The virtual machine requires NUMA spanning to be disabled

≪ Previous: Productivity: People are like hard disks

VMware vSphere doesn’t seem to keep the VM creation date in its database anywhere, but if your VMs are all/mostly running Windows (and you have permission to do a WMI query against them) you can get the OS install date, which probably aligns fairly closely to the VM creation date. It won’t be perfect, but it might be good enough, and better than nothing.

This PowerShell/PowerCLI script attempts to pull information out of your VMs, and allows you to fairly easily graph (via Excel) how your VMware environment has grown over time. Note that in its current form the script assumes that the VM name is the same as the Windows computer name.

The script gives you the following columns in its CSV file output:

VM name
Windows OS install date (or a reason why it wasn’t able to get it)
Number of vCPUs for the current VM
vRAM allocated to the current VM

My suggestion for graphing is as follows: sort the output by the InstallDate column, ditch the rows where the date couldn’t be retrieved. Shift the CPU and RAM columns three columns to the right to give a three column gap. In the column immediately next to the date, have a “Count” column – a simple incrementing number to keep count of how many VMs you had on that date, starting at one and going up by one per row (i.e. 1, 2, 3, 4, …). Then in the other two columns use simple formulas to keep a running total of the amount of vRAM and vCPUs that you have. To get the graph, select the column heading for the InstallDate column, then Ctrl-click on the column headings for the Count, TotalvRAM and TotalvCPU columns one at a time. You can then insert a line graph, and it’ll look halfway decent (albeit needing some tidying up and adjustment of the labelling of the axes):

Clearly if you have converted any physical machines to VMs then the install date will be that of the physical server, not the VM creation, so you might want to ignore anything before whatever date you went 100% VM, but it’s still better than nothing!

function LoadVMwareSnapins{
    $snapinList = @( "VMware.VimAutomation.Core", "VMware.VimAutomation.License", "VMware.DeployAutomation", "VMware.ImageBuilder", "VMware.VimAutomation.Cloud")

    $loaded = Get-PSSnapin -Name $snapinList -ErrorAction SilentlyContinue | % {$_.Name}
    $registered = Get-PSSnapin -Name $snapinList -Registered -ErrorAction SilentlyContinue  | % {$_.Name}
    $notLoaded = $registered | ? {$loaded -notcontains $_}

    foreach ($snapin in $registered) {
        if ($loaded -notcontains $snapin) {
            Add-PSSnapin $snapin
        }
    }
}

function GetInstallDate([string]$Computer){
    Write-Host "Processing $Computer"
    if(Test-Connection -ComputerName $Computer -Count 1 -ErrorAction SilentlyContinue){
        $OS = Get-WmiObject -Class Win32_OperatingSystem -ComputerName $Computer -ErrorAction SilentlyContinue
        if($OS -ne $null){
            $InstallDate = $OS.ConvertToDateTime($OS.Installdate)
            Add-Member -InputObject $ResultObject -MemberType NoteProperty -Name InstallDate -Value $InstallDate
        } else {
            Add-Member -InputObject $ResultObject -MemberType NoteProperty -Name InstallDate -Value "WMI fail"
        }
    } else {
        Add-Member -InputObject $ResultObject -MemberType NoteProperty -Name InstallDate -Value "Ping fail"
    }
}

$CSVFile = "$env:UserProfile\Desktop\VMAudit2.csv"
$TotalVMs = 0
$TotalvRAM = 0
$TotalCPUs = 0

LoadVMwareSnapins
Connect-VIServer -Server vsphere
$VMs = Get-VM

foreach($VM in $VMs){
    $ResultObject = New-Object -TypeName PSObject
    Add-Member -InputObject $ResultObject -MemberType NoteProperty -Name Computer -Value $VM.Name
    if($vm.PowerState -eq "PoweredOn"){
        GetInstallDate $VM.Name
    } else {
        Add-Member -InputObject $ResultObject -MemberType NoteProperty -Name InstallDate -Value "Powered off"
    }
    $TotalVMs = $TotalVMs + 1
    $TotalvRAM = $TotalvRAM + $VM.MemoryGB
    $TotalCPUs = $TotalCPUs + $VM.NumCpu
    Add-Member -InputObject $ResultObject -MemberType NoteProperty -Name vCPUs -Value $VM.NumCpu
    Add-Member -InputObject $ResultObject -MemberType NoteProperty -Name vRAM -Value $VM.MemoryGB
    Export-Csv -InputObject $ResultObject -Path $CSVFile -Append -NoTypeInformation
    $ResultObject | Format-Table -AutoSize
}
Write-Host "Total VMs: $TotalVMs, Total vRAM: $TotalvRAM, Total vCPUs: $TotalCPUs"
Disconnect-VIServer -Server vsphere

↧

SCVMM 2012 SP1: The virtual machine requires NUMA spanning to be disabled

June 5, 2013, 4:38 am

≫ Next: System Center Virtual Machine Manager 2012 SP1 and Hyper-V 2012 Networking Diagram

≪ Previous: Get VMware VM creation date (kind of)

I’m currently designing a system using the (still fairly new) System Center Virtual Machine Manager 2012 SP1 and Windows Server 2012 Hyper-V hosts. The hosts are old HP DL580 G5 servers with four X7350 processors (quad core, 2.93GHz, no hyperthreading or EPT-D/SLAT support).

They don’t support NUMA. The easy way to check for NUMA support on Server 2012 is to open Task Manager, go to the Performance tab, click CPU on the left, and on the graph(s) on the main pane right-click and go into the Change graph to menu. If NUMA nodes is greyed out then you don’t have NUMA available on your hardware:

Or you can use CoreInfo with the -n option which will tell you:

Coreinfo v3.2 - Dump information on system CPU and memory topology
Copyright (C) 2008-2012 Mark Russinovich
Sysinternals - www.sysinternals.com

Logical Processor to NUMA Node Map:
**************** NUMA Node 0

i.e. all the CPUs (represented by a *) are mapped to NUMA node 0.

(For reference, the CoreInfo results from a newer server that does support NUMA would look like:

Logical Processor to NUMA Node Map:
 ********--------  NUMA Node 0
 --------********  NUMA Node 1

from a Dell R710 with 2 x X5560 CPUs).

I’d created some Server 2012 VMs a few months ago and know that I’d been able to migrate them from one of the DL580 hosts to another with no problems. Then a few weeks ago I created some new VMs and just recently tried to migrate them – they wouldn’t move.

I got an error from the Migrate VM Wizard at the Select Host stage, in the Rating Explanation tab:

The virtual machine requires NUMA spanning to be disabled and the host either has numa spanning enabled or does not support NUMA.

So what was different between the new and old VMs? On digging around in the VM settings via PowerShell I found the NumaIsolationRequired property was set to true on the VMs that wouldn’t migrate, but wasn’t set to anything on the ones that would move (it was present but blank, i.e. neither true or false). The older VMs were created using SCVMM2012SP1 beta, and then moved to the release version, so perhaps that explains it.

The newer VMs had been configured to say “I must always run within a NUMA node”, but the hosts don’t support NUMA – and SCVMM isn’t of the mind to say “ok, well I’ll just run you anyway, seeing as I effectively just have one huge NUMA node, so you can never run on a different one”. But what to do about it?

There seem to be two options: Change the hosts or change the VMs.

Change the hosts

Via SCVMM, right-click the host and go to Properties, Hardware, CPU and untick the box Allow virtual machines to span NUMA nodes.
Or, via Hyper-V manager, select the host in the left pane, then choose Hyper-V Settings… under Actions for the host in the right pane. Got to NUMA Spanning, and untick the box Allow virtual machines to span physical NUMA nodes. You’ll then be told that you need to “Restart the Hyper-V Virtual Machine Management service to apply the changes”:

Interestingly, making the change via SCVMM doesn’t prompt you to do this.

Change the VMs

Option 1: From SCVMM, shut down the VM then right-click it, choose Properties, Hardware Profile, under Advanced click Virtual NUMA. Tick the box Allow virtual machine to span hardware NUMA nodes.
Option 2: Using PowerShell:

$VM = Get-SCVirtualMachine -VMMServer RCMSCVMMServer -Name RCMDevVM9
Set-SCVirtualMachine -VM $VM -NumaIsolationRequired $false

Further, the VMs have this option set because it has been configured in the Hardware Profile that the VM was created with, so you might want to change that – in the same way that you’d modify a VM above.

I’d like to point out that as a rule of thumb you probably don’t want your VMs to span NUMA nodes: NUMA spanning allows a process executing on a CPU in one socket to access RAM attached to a CPU in a different socket – which isn’t as efficient as keeping the RAM and CPU local to one another. But in a dev environment and/or if you’re using older hardware you might need to fiddle with the settings as I’ve had to above in order to enable live migrations.

↧

System Center Virtual Machine Manager 2012 SP1 and Hyper-V 2012 Networking Diagram

June 6, 2013, 2:26 am

≫ Next: Microsoft Remote Connectivity Analyser

≪ Previous: SCVMM 2012 SP1: The virtual machine requires NUMA spanning to be disabled

This is my attempt to make sense of the myriad of items that seem to need configuring to be able to get a Hyper-V VM to talk to a particular VLAN. I’ve come from a VMware vSphere 5.0 background, and sorting out the network in that is pretty easy, create a virtual switch, tell it which physical NIC(s) it can talk out on, then create port groups for your VLANs, then connect the VMs to those port group.

In SCVMM 2012 SP1 we have a lot more flexibility, but that means a lot more bits to configure and get linked together in the right places. I can’t guarantee that this is the best way to do it, but it’s working for me.

Click digram for full size version.

↧

Microsoft Remote Connectivity Analyser

July 1, 2013, 4:13 am

≫ Next: PowerShell: Kerberos Constrained Delegation for Hyper-V Live Migration

≪ Previous: System Center Virtual Machine Manager 2012 SP1 and Hyper-V 2012 Networking Diagram

Handy online utilities to check for issues. Can also help decypher SMTP headers.

https://testconnectivity.microsoft.com/

↧

PowerShell: Kerberos Constrained Delegation for Hyper-V Live Migration

July 19, 2013, 6:56 am

≫ Next: PowerShell: Live migration of all VMs from one host to another

≪ Previous: Microsoft Remote Connectivity Analyser

If you want to use Move-VM to live migrate a Hyper-V VM from one host to another then you need to allow the source host to access the destination host to push the VM at it. The preferred way of doing this is by configuring Kerberos Constrained Delegation.

Step one is to ensure that all your hosts are set to use Kerberos as the authentication protocol for Hyper-V Live Migrations. Step two is to add the KCD settings to the hosts’ Active Directory object Delegation settings.

This script does both of the above for all Hyper-V hosts in the specified OU. If you add more hosts to the OU, just run this script again – it won’t complain nor will you end up with multiple KCD entries.

$OU = [ADSI]"LDAP://OU=Hyper-V Hosts,OU=Servers,DC=rcmtech,DC=co,DC=uk"
$DNSSuffix = "rcmtech.co.uk"
$Computers = @{} # Hash table

foreach ($child in $OU.PSBase.Children){
   # add each computer in the OU to the hash table
   if ($child.ObjectCategory -like '*computer*'){
      $Computers.Add($child.Name.Value, $child.distinguishedName.Value)
   }
}

# Process each AD computer object in the OU in turn
foreach ($ADObjectName in $Computers.Keys){
   Write-Host $ADObjectName
   Write-Host "Enable VM Live Migration"
   Enable-VMMigration -ComputerName $ADObjectName
   Write-Host "Set VM migration authentication to Kerberos"
   Set-VMHost -ComputerName $ADObjectName -VirtualMachineMigrationAuthenticationType Kerberos
   Write-Host "Processing KCD for AD object"
   # Add delegation to the current AD computer object for each computer in the OU
   foreach ($ComputerName in $Computers.Keys){
      Write-Host (" Processing "+$ComputerName+", added ") -NoNewline
      $ServiceString = "cifs/"+$ComputerName+"."+$DNSSuffix,"cifs/"+$ComputerName
      Set-ADObject -Identity $Computers.$ADObjectName -Add @{"msDS-AllowedToDelegateTo" = $ServiceString}
      Write-Host ("cifs") -NoNewline
      $ServiceString = "Microsoft Virtual System Migration Service/"+$ComputerName+"."+$DNSSuffix,"Microsoft Virtual System Migration Service/"+$ComputerName
      Set-ADObject -Identity $Computers.$ADObjectName -Add @{"msDS-AllowedToDelegateTo" = $ServiceString}
      Write-Host (", Microsoft Virtual System Migration Service")
   }
}

↧