CitectSCADA and VMWare snapshot backup

Hello everyone,

We are using CitectSCADA 2016 with two redundant servers. Each server is running in a virtual machine (VMWare). Everything works fine except that sometimes (about once or twice a week), the backup server takes control for a few seconds (the IODevices switch to the standby) and then the go back to the primary server. Since this always happens between 22h30 and 23h00, we are trying to find what causes this.

After discussing with our IT department which manage the servers, they do a backup of the VM machines at about those times everyday by doing a "snapshot" of the virtual machine. We are wondering if this could cause a "glitch" on the primary server when this backup is made, which would cause the standby server to not see it for a short period of time and take control.

Has anyone had any similar experiences with VMWare, are there any "best practices" concerning VM configurations and backups when using Citect ?

Thanks for your help,

Patrice Jacob

Prosystech inc.

  • Hi Patrice,
    We have run Citect on VMware ESXi for several years and I haven't seen a snapshot cause IO devices to failover to the standby. It's likely highly dependent on the physical hardware it is running on and the backup software you are using though - so it's hard to comment.

    Just a small note that if you are using software licenses on these VMs then the licenses will be broken if you try to restore the backup. We're waiting for a better license solution to address this issue.
    Tim Marz
  • Hi Tim, you can use the AnywhereUSB for the license issue.
  • Hi Patrice,
    You may ask the IT how they perform the backup and if it is creating any glitches in the TCP/IP communication.
    I think the best is to contact support as well.
  • Yes, AnywhereUSB can be used. We use these as well but have started moving towards soft licensing so this system isn't reliant on another piece of hardware along with the firmware and driver updates that are required periodically. It's not nice having a single device that can lock up and take down all your licenses (or half of them) at once. Ideally soft licenses will provide higher reliability - as long as these issues with snapshots and license distribution are addressed.

    If you do use AnywhereUSB - then it's a good idea to split your licenses across 2 of them in different locations, and keep a spare. Also keep a record of driver versions and firmware so you can quickly swap out when needed.
  • Hi Patrice,

    I have not encountered switching over to standby because we use a cold standby solution (snapshots are copied periodically to standby hardware which is not active until it needs to be). However, we do experience similar issues like communications interruptions and system clock jumps during the taking of a VMWare snapshot.

    Regards,
    Patrick
  • Is there any setting from the VMWare tool that may create this glitches?
    Without being an expert, I feel that the VMWare Manager settings are playing the game.
    It should be some tricks here otherwise everybody will scream against AWS!
  • I believe the underlying issue is that VMWare puts the VM in a halted state during the backup (possibly using the Windows Volume Shadow Copy Service in the guest-OS). If taking the snapshot takes too long, the redundant server will lose connection to the primary and will become active until the backup has finished.

    I'm not a VMWare expert but Google finds that the "Quiesce" option might be related to this. Toggling this option might have some unwanted side-effects however, especially on domain controllers and machines that run databases, if I understand the article correctly.
  • Thanks everyone for your responses. I have forwarded this thread to the Server IT guys and they will have a look at it.