Database Not Connected Hardware Alarm - CitectSCADA 2018 R2 Update 4

I have been struggling with an issue here. In my hardware alarms, I have an alarm the states that the "Database Not Connected". Then when I go to the Alarm Summary page, I see thousands of alarms to say "Login attempt failed from <ip address> - unknown user" (the ip addresses on the message are my servers). When I look at my tracelog file, I see...

2020-03-22 10:47:27.747 -07:00 15492 0 Error AlarmClientAdaptor LegacyAdaptor::OnDataError ViewType=Display hCtrl=3 Error=DataRequestTimeout Message=Data not available Cluster=Stanton_U2
2020-03-22 10:47:27.747 -07:00 15492 0 Error AlarmClientAdaptor LegacyAdaptor::OnDataError ViewType=Display hCtrl=3 Error=DataRequestTimeout Message=Data not available Cluster=Stanton_U1
2020-03-22 10:47:27.747 -07:00 15492 0 Error AlarmClientAdaptor LegacyAdaptor::OnDataError ViewType=Display hCtrl=3 Error=DataRequestTimeout Message=Data not available Cluster=Stanton_U0

...when I look at my tracelog for the alarm server, I see...

2020-03-22 10:49:49.989 -07:00 15328 0 Error AlarmServerComms Exception An error occurred using the .NetApi Client in LogOn: {0} ClearScada.Client.AccessDeniedException: The username or password was incorrect.
at ClearScada.Client.Advanced.ScxComClient.ProcessServerException(Int32 requestCode)
at ClearScada.Client.Advanced.ScxComClientTcp.SendRequest(Int32 requestCode)
at ClearScada.Client.Advanced.ScxComLinkServer.LogOn(String userName, SecureString password, ILogonInformation& logonInformation)
at ClearScada.Client.Advanced.ScxComLinkServer.LogOn(String userName, SecureString password)
at ClearScada.Client.Simple.Connection.LogOn(String userName, String password)
at SchneiderElectric.Alarm.Server.Connection.Manager.ClearScadaClientApiConnection.LogOn(String userName, String password)

We have configured roles to use our corporate domain logins plus a few additional Citect users for the API connection used for the Wonderware Historian connector and kernal access.

We get these errors no matter what client we run, even the one on the servers. We also have shutdown the connector and all remote clients, same errors. I am beginning to think this is a bug of sorts, as these errors have added up to about 7GB of alarm event storage data in the last 12 days.

We have also tried to running the alarm servers in 64bit mode, same result.

We are running 2 physical servers, each with 3 clusters assigned to them. We have manually defined the port numbers for the second and third server processes so that they can coexist.

Being that we are run our clients and servers inside our own network, we have the windows firewalls turned off, but just for good measure, we have allowed all traffic on all ports and network types on both the servers and all clients.

What user name are the logs pointing to? We have setup the appropriate domain user groups to the Citect.**** groups. These errors still occur even if nobody in logged into the Citect client, it seems to be a server thing...but I'm not even sure that's accurate.

  • I think I'm narrowing it down, in the alarm server process logs, is see the following...

    22-MAR-2020 20:43:56.247 4310 [SCX] 8:12 Logon( IN: User NT AUTHORITY\SYSTEM, AllowCachedLogon False )
    22-MAR-2020 20:43:56.247 4310 [SCX] 8:12 Logon( OUT: Hr 80070005 -> Access is denied., ObjectId -1 )

    Any clues on how to troubleshoot further?
  • Chris,
    are you maybe running Citect as a service and have [client]autologinmode set to login the current windows user?
  • Erik,

    Yes, as a test I set the parameter to 0 on the servers and that fixed my last issue, but i still have all of my previous issues. Seems to be worse with encryption enabled, but even with it disabled, I still have the same errors everywhere.

    Will I need to turn that parameter off on all of my clients too?

    Thanks, Chris
  • Chris,

    Will I need to turn that parameter off on all of my clients too?

    If all your clients are running the "Citect Runtime Manager" service and if all our running using the default "Log On as Local System Account" for this service, than that might be the easiest. This is sort of related to https://softwaresupportsp.aveva.com/#/okmimarticle/docid/tn9059.

    Alternatively you can create a role in your Citect project which includes the NT_Authority\SYSTEM (e.g. local Administrators group) or you could consider to run the Citect Runtime Manager Service under another user account.

  • Erik,

    I think I understand, below are the roles we have had configured...

    BUILTIN\Administrators

    WEC\CitectAdmins

    WEC\CitectOperators

    WEC\CitectTechnicians

    WEC\CitectVisitors

    WEC\Domain Users

    Here are two more I just added...

    NT Authority\SYSTEM

    LocalSystem

     

    I still get the error above and even have the errors below came back...

    23-MAR-2020 13:40:57.793 27AC [SCX]  5:10 Logon( IN: User NT AUTHORITY\SYSTEM, AllowCachedLogon False )

    23-MAR-2020 13:40:57.793 27AC [SCX]  5:10 Logon( OUT: Hr 80070005 -> Access is denied., ObjectId -1 )

     

    I'm also getting these errors...strange to me because windows firewall is off and I can ping the servers from each other

    2020-03-23 06:40:00.438 -07:00 5652  0 Error Transport TcpipTransport::EndConnect() [CLIENT 0.0.0.0:64940 --> 192.168.30.41:2080 #18] SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.30.41:2080

    2020-03-23 06:40:00.777 -07:00 5652  0 Error Transport TcpipTransport::EndConnect() [CLIENT 0.0.0.0:64963 --> 192.168.30.41:2084 #36] SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.30.41:2084

    2020-03-23 06:40:01.017 -07:00 8812  0 Error Transport TcpipTransport::EndConnect() [CLIENT 0.0.0.0:64976 --> 192.168.30.41:12080 #15] SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.30.41:12080

    2020-03-23 06:40:01.033 -07:00 16964  0 Error Transport TcpipTransport::EndConnect() [CLIENT 0.0.0.0:64979 --> 192.168.30.41:22084 #30] SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.30.41:22084

    2020-03-23 06:40:01.591 -07:00 15612  0 Error Transport TcpipTransport::EndConnect() [CLIENT 0.0.0.0:65005 --> 192.168.30.41:12084 #33] SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.30.41:12084

    2020-03-23 06:40:04.330 -07:00 12092  0 Error Transport TcpipTransport::EndConnect() [CLIENT 0.0.0.0:65059 --> 192.168.30.41:2082 #3] SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.30.41:2082

    2020-03-23 06:40:25.132 -07:00 1196  0 Error Transport TcpipTransport::EndConnect() [CLIENT 0.0.0.0:65257 --> 192.168.30.41:22080 #12] SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.30.41:22080

    2020-03-23 06:40:25.792 -07:00 12092  0 Error Transport TcpipTransport::EndConnect() [CLIENT 0.0.0.0:65294 --> 192.168.30.41:2084 #36] SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.30.41:2084

    2020-03-23 06:40:26.070 -07:00 8812  0 Error Transport TcpipTransport::EndConnect() [CLIENT 0.0.0.0:65308 --> 192.168.30.41:12080 #15] SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.30.41:12080

     

    I can confirm that the alarm servers are communicating with each other, because i can delete the alarm history on one server and uis gets back filled from the running one.

    Chris

  • This may not be related but have you tried running the Computer Setup Wizard on all machines, making sure that all server passwords are identical?
  • Patrick,

    I used the Configurator to set those as we use the deployment mechanism. But I did go ahead a re-set those and we get the same result.

    It's seems to be only related to the alarm server processes, the trend, reports and IO servers are good. Supper frustrating. After 24 hours, my alarm summary page takes a long time to load and it's filled with "logon attempt fail from "server IP address" - unknown user".

    Thanks for the suggestion,

    Chris
  • Chris,

    Did you try to set the [client]autologinmode=0?

    Based on what you shared a bit of more investigation might be necessary, so probably it is best to raise this with AVEVA support.
  • Chris,
    You probably tried this already, but if not:
    Do the errors persist if you delete the alarm databases on both servers (after stopping the alarm servers first)?
  • Patrick, Erik,

    I finally found a work around of sorts. On my primary server, I have to add my secondary server computer name to the administrators group, and add the primary server to the administrators group on the secondary server. All of my issues have now been resolved...!

    What I'd like to know, is why I needed to do this at one of our sites but the rest of our fleet works fine without this.

    Thanks for the input,

    Chris