Factry ⎮ OPC-UA and time synchronization: Common errors and how to fix them ⎮ 12 minutes reading time

No items found.

Technical Deep Dives

Sibe Bleuzé

min read

Published on

Dec 1, 2024

Being part of Factry’s Data Engineering team means we’re spending a lot of time providing customer support and fixing specific issues in our clients’ setup. We’ve noticed that we’re frequently being asked similar questions and spend quite a bit of time helping users fix similar errors.

In this blogpost, we’ll be going over some common OPC-UA and time synchronization errors. We’ll show you what they mean, why they occur and how you can fix them yourself if necessary.

Before we get started:

Throughout the blogpost, we are working with an OPC-UA client reading from an OPC-UA server, polling and/or monitoring multiple NodeIDs, while persisting the received values in some way (Value+Timestamp+Status) for further processing down the line.

The OPC-UA server in this case is publishing values from one or more underlying devices (PLCs, SCADAs, …).

We also assume the reader is familiar with the basics of OPC-UA, such as NodeIDs, polling (periodic read request) vs. monitoring (subscription), OPC-UA status codes, …

OPC-UA (server) limitations and how to work around them

Scaling of the data collection

Timeouts

A common problem encountered when scaling data collection to more and more NodeIDs, possibly coming from more and more devices, is that at some point the OPC-UA server takes longer than desired to respond. If your OPC-UA client implements timing out of requests (which it should), you will receive a BadTimeout (0x800A0000) response.

In most cases, the timeout occurs because of some problem with the OPC-UA server itself. It might have issues communicating with one or more devices or, if you’re requesting data for a lot of NodeIDs at once, it might just be busy gathering all that data and running out of time before that is finished.

A first solution might be to increase the time the client waits for the response of the OPC-UA server. If you’re lucky, that is all you need to do. In other cases, you might encounter a new error, such as the ones discussed below, which will allow you to more specifically investigate what the real issue is.

Typos

Unless you’re using some kind of automated system to expand your configuration to more nodes, it is likely that at some point you will make a typo in the configuration.

This might result in errors like for example BadNodeIdUnknown (0x80340000) when the typo is in the NodeID, simply Bad (0x80000000), or data not being collected at the desired rate when the typo is in the request rate.

Usually, this will get spotted faster if there is interaction with the data being collected, so ownership of the resulting data is beneficial.

Registering nodes for faster reads

As you read more and more NodeIDs in a polling setup, it is beneficial to register them with the OPC-UA server. A register request takes the NodeID, which can be a long name depending on your setup, and returns a short numeric NodeID to the OPC-UA client, which can be used to retrieve data for this node in future read requests.

This speeds up reading data when you have a lot of NodeIDs with long names, as by using the numeric NodeIDs the size of your read request decreases and a smaller request takes less time to be transmitted to the OPC-UA server.

Most OPC-UA servers will implement a limit to how many NodeIDs can be registered in a single request. A possible indication of having reached this limit is when the OPC-UA server responds to the register request with BadTooManyOperations (0x80100000).

This limit might be configurable from the server side, or you might have to take measures from the client side to accommodate the limit. The latter is done by splitting up the full list of NodeIDs to be registered into multiple register requests.

Grouping nodes for reading

Next to registering the nodes, you’ll want to frequently read data from, you also want to bundle multiple reads into one read request, to reduce the request overhead per node. While at first you might get away with polling everything at once, this approach might get you in trouble as your setup grows.

Most OPC-UA servers implement a limit on the number of nodes in a single read request. Like the limit for register requests, this might also be configurable, but in some cases it is necessary to have a solution from the client side as well.

Next to lowering the request size, grouping the nodes into batches allows you to reduce the impact of a small number of nodes being unresponsive.

For example, if your request times out after 10 seconds and 20 nodes in a request with 5000 nodes together take 9 seconds to respond, the client will probably receive a timeout status for all 5000 nodes instead of just for the 20. If there had only been 500 nodes in the request, the server might have been able to gather the remaining 480 nodes in the remaining second and only 20 nodes would receive a bad status.

Looking back at this example, it would also have been beneficial if the server would have had a timeout in place when reading values from the underlying devices. If your OPC-UA server has this kind of functionality available or it is able to temporarily ignore devices with a slow connection for a while (this is called demotion), it is wise to set this up as much as possible.

Don’t overload your server

While the reasoning above is valid to avoid some unresponsive nodes from influencing too many others, you should keep an eye on the total number of requests that are being sent to the OPC-UA server.

In normal circumstances, an OPC-UA server can handle at least a few thousand nodes per read request. If you’re splitting these up in small batches of nodes however, that just creates more requests to the server. That could also negatively impact performance, certainly if the request interval is small, as is often the case for real-time data processing.

Therefore, it is advisable to set up any measures to deal with unresponsive nodes in the OPC-UA server and not in the client, so that you can make optimal use of multi-node read requests. Only when the server is managed externally, you should instead make use of the measures discussed before.

Large scale monitoring

If you need sub-second precision or you only need to receive the values on change, you’re likely to be using the subscription functionality embedded in OPC-UA, also referred to as monitoring. This mechanism makes it the responsibility of the OPC-UA server to provide your client with the values it subscribed to. Because this requires resources on the server side, most servers implement (possibly configurable) limits on the number of monitored items per client and in total.

When your client exceeds either of these limits, the server will respond with BadTooManyMonitoredItems (0x80DB0000). You can either increase the limit defined in the server, or if that is not possible, split the data collection over multiple clients.

In some cases, this response might occur even if the number or monitored items does not exceed the limit. Usually, this will be because a previous session of the client was not closed correctly in the OPC-UA server and all monitored items are counted twice.

The solution is then to find the lingering session in the OPC-UA server and close it manually, so your actual session will have access to the correct number of monitored items again.

Encryption

If your OPC-UA client is installed on the same machine as the server and connects using the localhost network, it is safe to do this without encryption. In this scenario it is more important to encrypt your data when sending it out of the machine as opposed to inter application encryption locally.

When your OPC-UA traffic is travelling between different servers, you’ll probably want to enable encryption. The OPC-UA protocol provides this functionality in the form of different security modes and policies.

Setting up encryption can be done by enabling a few settings in both the OPC-UA server and client. If the Sign&Encrypt security mode is used, you might have to accept the clients’ certificate on the server side. The server itself might also have a certificate, all you need to do with that is make sure it gets renewed before it expires. Usually that involves nothing more than a few clicks in the server settings.

Timestamps

Polled data collection

If you’re using read requests at regular intervals (polling) to fetch data from the OPC-UA server, the server will provide a timestamp in the result. It’s also possible to add a timestamp from the client side if you need the timestamps to be aligned for processing purposes. In the first case, if the server timestamp is out of sync, any processing of this data further down the line will be affected. In the second case, the client timestamp is the one that matters.

For this reason, in a polling scenario it is important that whichever clock is determining the timestamp is accurate, which can be done for example by synchronizing it with an NTP server. For installations connected to the internet, such NTP servers are naturally available.

The same can not be said for offline installations, such as those generally found in industrial production environments. These then need an internal NTP server to be set up within the closed production network so they too can be synchronized.

If the timestamp of the server is used, you should check where it is fetching that from. It could be an internal timestamp or one fetched from the underlying device where the data is coming from. In the latter case, you’ll have to synchronize that device too to receive accurate timestamps on your data.

Timestamp accuracy with timeouts

To account for increased response time of the OPC-UA server, you might increase the timeout on the client side. A side effect of this is that the accuracy of any client-side generated timestamps decreases.

Monitored data collection

For monitored nodes, you should use the timestamp provided by the OPC-UA server. This allows you to see very accurately when the change happened, which might be important in for example alarm monitoring.

Because the OPC-UA server provides the timestamp in this case, it is essential to have the time synchronization in place on the server.

Since the timestamp of a datapoint is important for monitored values, it helps to understand what is happening if a client is disconnected for a while and then reconnects. The behaviour in this case is controlled by the OPC-UA server.

The server can resend the last update of the monitored node with the original timestamp, so the client can catch up from there. This potentially sends the same datapoint twice.
The server could also resend the last value, but with the timestamp when the connection was re-established. This gives you a visible confirmation that the connection is ok again, but does potentially result in two consecutive points with the same value.
The third and last possibility is that the server does not send a datapoint on reconnect at all and just waits for the next update of the value to send it.

The simplest solution is often the best

If you’re frequently in contact with an IT support provider, you’re probably familiar with the phrase: “Have you tried turning it off and on again?”

They’re not mocking you, it could actually help to restart an OPC-UA client or server to fix an issue. There could be an old or invalid value stuck in memory or the software could have gotten stuck in a very specific scenario that only occurs once in a blue moon.

If not for solving the issue, it’s still possible that a restart might produce some log messages that will in the end help you get to the solution faster.

If all of the above could not fix your problem, you might come to the conclusion that there is a bug in your OPC-UA client or server. When reporting this to the software provider, the first thing they will ask you is the version of the software you’re using. Bugs are reported all the time and simply using the latest and greatest version may already help you overcome the problem you are experiencing.

Where does Factry stand in all of this?

Factry has hundreds of OPC-UA data collectors active in production environments all around the world. Together with our Factry Historian, they are part of the data pipeline and make production data available to thousands of users.

If you’re interested in our OPC-UA collector, you can find the documentation here: https://docs.factry.cloud/docs/opc-ua/latest/

You’ll see some of the error codes discussed above are also indicated at the bottom of this documentation page: https://docs.factry.cloud/docs/opc-ua/latest/basics

If you’re interested in OPC-UA in general, you can read all about it in the official OPC-UA foundation reference documents here: https://reference.opcfoundation.org/

Agristo

How data democracy changes everything in your plant, from productivity to engagement

OPC-UA and time synchronization: Common errors and how to fix them