Do look down: the benefits and challenges of surface based localisation

Dr Tim Lukins




"Where am I?" - This is the (often quoted) starting point at the centre of the operation of many of today's robotics, connected and autonomous vehicles (CAV's), and automated inspection solutions.

From knowing position – all of these types of system then traditionally follow a well-known approach:

  • first, record synchronised measurements accurately in space and time;

  • then, measure and assess what the situation or state of the world is;

  • finally, plan what to do next.

Seems simple, right?

But what if you don’t really know where you are – the process known as localisation. You'll then fall at the first hurdle. Perhaps because your existing positioning sensors (GPS, LiDAR, visual cameras etc.) have limitations on their accuracy and performance. But how can you tell? What data can you trust? Is their another source of position you can use that is independent of these?

At Machines With Vision this problem of truly being able to determine accurate location has been our focus. And we’ve been trying to tackle this (quite literally) from the ground up.

In most systems the starting point for measuring position is inherently based top-down on Global Navigation Satellite Systems (GNSS) - and existing further enhancements to GNSS such as Real-Time Kinematics (RTK) or Precise Point Positioning (PPP) which have accuracy down to 10cm or lower.

However, due to slow convergence times these augmented GNSS based approaches can fail to work at speed; are limited by the need for supporting infrastructure and communication; and do not work in naturally occurring situations where overhead line-of-sight is denied (e.g. tunnels, indoor spaces, car-parks). They are also at considerable risk to spoofing and security.

Finer navigation is then often possible by utilising high-definition maps (HD-maps) of the immediate environment. Such maps can be constructed from active sensing technologies, particularly the use of LiDAR, or by passive sensing approaches, for example in the form of dense stereo or monocular depth estimation via deep-learning using traditional cameras.

Most often this results in a significant challenge to deal with - and make sense of - the tremendous wealth of data that must then be processed in real-time if it is to be useful. This problem is often compounded by the reality of the dynamic, rapidly changing nature of the local environment - in which other vehicles and people may also be moving or daily shifting position, alongside dynamic changes in lighting governed by the time of day or year, and the inclemencies of the weather which can radically effect a sensors capabilities.

Even more fine control can be added internally by the use of simple accelerometer and other inertial measurement units (IMU) that measure instantaneous velocity; or by wheel encoders that measure rotation of wheels, given a known constant diameter of tyre. However, such sensors are subject to considerable accumulation of error, resulting in approaches for "dead-reckoning'' rapidly ending up out of position.

Naturally, combining GNSS, plus HD-map type solutions with these other inertial sensor modalities is the generally accepted way to overcome some of the shortfalls - leading to often impressive results. This combined approach - most commonly known as sensor fusion - is then considered the most principled way of handling the uncertainties and incorporating multiple estimates of position.

But what all these solutions miss is another incredibly reliable source of information for location.

Instead, rather than looking up for GNSS, or looking around with stereo cameras or LiDAR to match against a HD-map, why not look down at the ground surface itself - at the fine and fixed structure apparent there (as shown in the image below).

IMG_5037.jpg

It may not come as a surprise that a number of earlier solutions have sought to utilise this information of the ground texture - and to show it can be made to work.

An excellent - and very up-to-date - summary of the many useful properties of using ground surfaces as diverse as different grades of asphalt, concrete, and granite is presented by the work of Zhang, Finkelstein, and Rusinkiewicz [1]. In this the authors posit the key observation that while many ground textures may look random and homogeneous, they all contain imperfections that are persistent - and, if they can be imaged and those details resolved - then used as local features to identify a location.

As a practical demonstration, the Ranger project carried out at the South Western Research Institute [2] performed feature based localisation entirely from a vertically mounted, ground-facing camera. In this they sought to create a system capable of working at higher speed, and consequently build a strobe synchronised lighting system coordinated with the high-speed camera. On this basis, and with a simpler approach to matching, Ranger was actually able to achieve reasonable speeds and control a full-sized CAV to perform high-precision route-following.

More recently, the work of [3] is a more principled way of localisation and mapping based on the road surface, along with sensor fusion of GNSS data. They do not directly use this to control the motion of a vehicle, instead show the overall accuracy with which the resulting map is built up - by then comparing it to satellite imagery of the same route to show its validity.

These examples highlight that the reality of making this approach work is, however, still prone to a number of additional challenges.

After all, if you are looking down to know where you, the following three questions often naturally follow:

  • What if you cannot see the ground surface? What if it snows - or debris and other obstructions get in the way?

  • What if the ground surface changes? What if your road gets resurfaced or your rail-track ballast gets tamped?

  • What if the ground surface passes by too quickly? Surely at higher speeds this blurring will only get worse?

Regarding the first question - yes, if you can't see the majority surface features your original map is built from, you won't be able to match and say with certainty where you are. However, if you can see some features - even occasionally - then these intermittent matches will act to relocate you and correct any drift. Furthermore, even temporary features on the surface can still be tracked and used to determine visual odometry - and so contribute a valuable speed estimate.

On the second point, roads are constructed with graded, robust, textured surfaces with specific purposes in mind, for traction, water dispersal and noise reduction; and railway lines use ballast (or slab) to solidly support the rails. The durability is defined by the rate at which these surfaces degrade over a number of years - depending on the volume of traffic over them and the natural environment. This is mostly a gradual decay over the period of many years. However, it is true that if you then rip it all up you will have to go over that surface at least once again to gain any location (but you will still benefit again from an immediate estimate of odometry).

Finally, on the last point - all such systems necessitate reasonably close proximity and motion of the surface relative to the vehicle. This can have many advantages, the main one being that closer range leads to greater precision. But traditional cameras can struggle with the exposure time to correctly take a “snapshot” of the surface, often at the cost (and power) of additional stobe lighting. Furthermore, such a system will then need further power for the necessary processing required to keep up with the intense flow of image data.

So, ultimately, if we can rely upon the surface to be mostly visible and immutable over the speed of travel - then we can use it.

The final challenges are then how we can reliably and invariantly extract the signature (or "fingerprint") from that surface as we pass over it.

At MWV we have a unique (and patented) solution to this – but more of that in another post.

 

References:

[1] High-Precision Localization Using Ground Texture”, Linguang Zhang, Adam Finkelstein and Szymon Rusinkiewicz, 2019, International Conference on Robotics and Automation (ICRA). https://gfx.cs.princeton.edu/pubs/Zhang_2019_HLU/index.php [TL1] 

[2] “Ranger: A ground-facing camera-based localization system for ground vehicles”, Kristopher Kozak and Marc Alban, 2016, https://www.swri.org/press-release/swri’s-ranger-localization-technology-allows-precise-automated-driving

[3] “AEKF-Based 3-D Localization of Road Surface Images with Sparse Low-Accuracy GPS Data”, Diya Li, Yazhe Hu and Tomonari Furukawa,  2018, IEEE 88th Vehicular Technology Conference. https://www.semanticscholar.org/paper/AEKF-Based-3-D-Localization-of-Road-Surface-Images-Li-Hu/cdf527973457d6173cfb51f7df60c790ad102d40  

Previous
Previous

An interview with the CEO, Anthony Ashbrook