The lifeblood of autonomy: why data is so important for our company

Peter Härslätt

2023-05-17

4 min

Blog

Autonomous Next

Safety

It was the American engineer and statistician, Dr William Edwards Deming, that famously said “without data, you’re just another person with an opinion.” A statement that rings as true today as it did decades ago.

Put simply, to succeed as a company within autonomous solutions, we’re heavily dependent on data and software development. After all, it’s data that reveals how we stay on track as a company, how we improve autonomous vehicle functionality, how we generate new revenue models, and how we develop and build our products, based on customer needs and in the safest way possible.

Data is the lifeblood of our industry – there’s no outlook to creating an autonomous service without it.

A truckload of data

Everything we build must be robust, automized, and scalable. In one session, an autonomous vehicle is capable of producing a colossal amount of data: 1 petabyte, to be precise. That’s the equivalent of 4 years of high-definition television. Mind-blowing, when you think about it. So, how do we go about collecting and processing such a huge amount of data – that is, how do we make it useful?

That question is best answered with a model called the 5 Vs of data, which enables us to evaluate data we collect and its usefulness to us as a company. This needs to be considered before we start the data collection process. '[1]

1. Volume – first we must assess the size and amount of data that needs to be processed. In the case of autonomous trucks, this is usually very sizeable

2. Value – this is the most important “V” from a business perspective. We must understand what value the data we’re collecting will provide. The value of big data usually comes from insight discovery and pattern recognition that lead to more effective operations, stronger customer relationships and other clear and quantifiable business benefits – for us at Volvo Autonomous Solutions, this could be technology advancement or monetary

3. Variety – when we test, we need to collect a diverse range of different data types, including unstructured data, semi-structured data and raw data. The more diverse the data, the more complex, but often the most useful

4. Velocity – this is the speed at which we receive, store and manage data. As you can imagine, this is a very important part of the process.

5. Finally, Veracity – the accuracy of data, which is naturally of high importance. A small amount of “true” data is far more useful than a large set that’s unreliable

The 5 Vs model illustrates the variety of angles that need to be covered when working professionally with data – without them, we wouldn’t be able to use data at scale for development and testing of autonomous functionality.

Real world vs Simulations

The next logical question to ask ourselves is: where does the data come from?

First, we define which data is needed, retrieving parts of it remotely over the air through wireless protocols, and parts of it by disks. The data collection mechanism covers a wide range – from basic signal conditions to advanced analytical algorithms processing data in the edge nodes, or even as federated learning. Federate learning is when data is instantly analyzed in situ and doesn’t even need to be collected, but instead immediately processed where it is generated, e.g., in each vehicle. It’s safe to say, when it comes to data collection, we’re only getting started.

Today, we collect a variety of data from real-world tests and 3D simulations. But why, when we have test tracks and mighty machines and vehicles, do we need to rely on simulations? The answer is simple: time, logistics, and speed.

In the virtual world, we’re able to test and iterate far beyond the comprehension of physical testing. For example, we can manipulate parameters in any way we want – whether that’s terrain, agents, or weather – with the ability to add multiple eventualities. This would be the equivalent of millions of test cases, logistically impossible to replicate (or at least extremely costly and time-consuming) in the real world. By running simulations, we can control pretty much everything, running simulations in parallel and at speeds unfathomable in the physical world.

How much data is too much data?

With autonomous trucks harvesting petabytes of data each day, you’d be forgiven for thinking that the more data we have, the better we’ll be.

But there can, in fact, be such a thing as too much data. For example, when you have more data than you know what to do with, you can experience congestion – that is the inability to process everything that’s being collected – additional costs and lead time. Put simply, with more data, more needs to be handled at every step of the process. In addition, all parts must be compliant with data privacy regulations. Another lengthy process. That’s why, when it comes to data, quality beats quantity. Excessive data just becomes a waste of time and resources.

Analyzing a single data set is far simpler than multiple data sets – but we can’t always take the easiest route. Conflicting data, on the other hand, is a different topic, that is when one sensor senses one thing and one senses another. We call this diversity and have multiple sensors for this very purpose, but it does complicate the process.

However, when you untangle these data sets, it becomes very useful. And despite the additional effort, the output of multiple data sets is superior to just having a singular set because diversity always wins – even in the data dimension.

Does more data mean ‘more safe’?

To answer the question simply: not necessarily, but it could be the case. To reach a higher level of safety, more data and more skills in handling the data to draw the right conclusions and process it in the best possible way.

So, more data and ‘more safe’ is not a causality, but it’s impossible to claim a safe product without data.

Everything we do is evidenced with data – Volvo Autonomous Solutions is built from the ground up with data as a foundation and enabler. There, to stay competitive, it must always remain a high priority.

We’re all in this together

Putting it bluntly, companies with good data methodology will outcompete those who fail to excel in this area. At Volvo Autonomous Solutions, it’s a reason for us to exist and we strive to specialize in everything that’s unique in autonomous data.

I believe one of the biggest opportunities when it comes to data will be in the collaborative way we in our industry share it with one another. Standardization of formats, refinement processes, and implementations for the autonomous industry would all promote and accelerate cooperation without compromising competition. And, in turn, bring us one step closer to delivering the benefits of autonomous transport – better safety, greater efficiency, and sustainability – to our industry.

The lifeblood of autonomy: why data is so important for our company

Author

Author

A truckload of data

Real world vs Simulations

How much data is too much data?

Does more data mean ‘more safe’?

We’re all in this together

The lifeblood of autonomy: why data is so important for our company

Author

Author

A truckload of data

Real world vs Simulations

How much data is too much data?

Does more data mean ‘more safe’?

We’re all in this together

Do you wish to stay updated with the insights like this one? Subscribe to our newsletter.