Technologies and Principles
A scatter gun of Geospatial & EO reflections from two weeks in Europe.
Over the last couple of weeks, I was lucky enough to attend both the Living Planet Symposium (LPS) in Vienna and the High-Level Experts Group (HLEG) Towards a Big Data revolution for the Planet: From Uncertainty to Opportunity in Frascati. These two events were both inspiring and enriching.
This post is a scattergun of thoughts inspired by these two events. For the HLEG, I was lucky enough to act as a rapporteur and facilitator for both a tech and innovation session and two of the three deep dives. As a result, I have a tremendous amount of notes and photos of slides. If you want more specific observations, please comment or reach out directly.
Given that this event was subject to the Chatham House Rule, all specific commentary is anonymized, and I will try to provide an overview of the tone. My thoughts are focused, as you might expect, more on technology and execution than on policy. I’m staying in my lane for now!
Geospatial data is a mess. Maybe not locally, but globally. What I mean by that is that different organizations store and manage data differently. With various standards and different products managed by disparate procedures. We heard stories of files being a pain for small companies, open data portals, and numerous sensors capturing multiple phenologies of data pertaining to single locations, leading to data management and harmonization issues (nightmares?)
We’ve heard about too much data in some places, but data gaps in others. There is a demand to move from fragmentation to more holistic data management and de-siloing. However, while the word “federation” is used frequently, in practice, this often just means replication. While there is value in redundancy, too much replication is a source of both error, cost, power, compute, and intellectual inefficiency.
The notion of data federation is interesting. On reflection, I do wonder if a reference to where data is published would be the cloud-native way—in effect, an API proxy. How many times across the web has the Sentinel archive been replicated? Some redundancy is probably helpful and even prudent, but how much redundancy is too much?
This reminds me of the lament of the lonely pixel, just repeated numerous times.
We heard the fearsome phrase, “the freedom to lie.” This is the first time I have heard this observation, and I mentally categorize it alongside the words “alternative truth.”
From an Earth Observation (EO) perspective, I am filled with a sense of dread. I don’t believe we are equipped for the increasing anti-science rhetoric, bots, and deepfakes that we must undoubtedly contend with. In my naivety, I feel our community has largely been immune to or ignored by nefarious manipulation. But that will almost certainly change. How do we ensure that a pixel captured is the truth we expect it to be? Is there a technical solution scalable enough for pixel or vertex provenance? Is there a human solution to the issue of trust?
This question can also be considered from the more general geospatial perspective of data processing. What tools do we have for holding the history of data preparation or algorithmic application? This may become increasingly important as we see more Artificial Intelligence (AI) tools and as we note the difference between data and applied information derived from that data. It will become increasingly important to be able to document what has happened to pixels. Regulators within the financial sector will demand this, and those motivated by transparency will also need to see this algorithmic chain of custody.
Consider the technical opportunities and challenges of multi-scale, multi-sensor, multi-source data acquisition, including hyperspectral (with its extra-dense spectral load). We have an astonishing data flow from the surface, sky, and space. This growing torrent of data leads me to consider the use of Geospatial Foundation Models (GFMs) to discover hidden spectral signatures in the stacks of pixels and points we collect. Yet, the black-boxish nature of GFMs concerns me, certainly given the notes on transparency and algorithmic custodianship above. However, traditional methods may simply be too slow and specific to account for the sheer volume of monitoring data available.
Yet again, I am left to think about data harmonization. It’s worth briefly considering data standards again. But with all these sensors, how can we not?
Standards, standards, standards… sigh.
In particular, considering whether our approach to standards has been sensor-centric rather than phenomena-centric. For instance, when we think about the EO activities, do we actually care about the images? Do we care about the pixels, an image, or a strip? The swath of an image is defined by the sensor, not by the landscape reflected in those pixels. In the end, do we care about the sensor or the location? While an image captured can support numerous derived products, it is rarely the final product. The image is more of an unrefined compound, awaiting fractional distillation into information. I’ve said this before, but every pixel is, in fact, a problem to solve. That problem will change within the context of geography and an expert’s application. But it’s still a problem.
So, should our standards be for the image/sensor or the measured phenomena? Come to think of it, is Overture’s GERS model an example of this? Clearly, this is not for EO, but more generally for map data.
From a technology and innovation perspective, we often focus on numbers in arrays or databases, but how do we incorporate qualitative information? How do we consider Indigenous stories, for instance?
How can we build any sense of data equity without considering additional, qualitative data sources? Indeed, we could also note that social data is a form of qualitative data; on reflection, there are some interesting examples of social data being put to work (Danti comes to mind).
I have spent the last two weeks discussing new sensors and data sources. All these crucial innovations require significant compute, infrastructure, and intellectual capital. There is a serious question of power consumption and technical capacity/capability that needs to be considered.
Finally, for the last few years, I have been hearing more and more about eDNA. This is an interesting tool for ground truth in biodiversity data. If this could be pulled into a citizen science, crowd-sourced environment, we could build a “23 and me for the planet”…?
Finally, and perhaps most importantly, data being fit for purpose was mentioned time and time again. This is an interesting comment; on reflection, data usually sits on a spectrum of suitability for different purposes. Different spectral, spatial and temporal characteristics might change where a data product sits on that spectrum for the measurement of a phenomenon. Given it takes a few years to become acquainted with EO and geospatial data, I bet a co-pilot for data suitability would be very handy for certain communities with less access to deep technical experts. Helping new users (and old, for that matter) understand the limitations of different data products within the context of a proposed use case or algorithm could ensure known limitations are considered up front. Perhaps even packaging a sense of provenance.
Each of these topics is worthy of further exploration, and I am sure I will explore each and the connective tissue between them in the coming months.
The Frascati meeting ended with the collation of a series of principles and pathways, the Frascati Principles, which I will talk about in a future post.
PS. If you get the chance to visit ESRIN, the ESA facility outside Rome, it is a great trip, go!