Hi, I’m Wouter Janssens, Co Founder and CEO of Digita. In this episode of Digita's Tech Talks, we zoom in on the most fundamental components of the Solid ecosystem: a Solid pod.
As we explained in the previous episode, a Solid pod is a place to store data as resources. Traditionally, most data storage solutions are specialized in handling either unstructured data, semi-structured data or fully structured data.
Examples of unstructured data are images, plain text documents or binary files, which are typically stored on services like DropBox and OneDrive, or managed in big data lakes. These storage solutions act as a vault for your data: they provide a space where data can be securely stored in, retrieved from, and maybe linked to, but it is hard to organize and analyse such data.
Structured data, on the other hand, adheres to a fixed, predefined data model, which enables efficient organization and analysis within databases. Typical examples are names and addresses in a contact list, the details of a stock market, or even your social likes. The models that define the structure of this kind of data are formally described, and are agreed upon by several entities. As such, these models form a first layer of interoperability between applications. However, between different models, data interchange is much harder.
In between structured and unstructured data, we also find a variety of semi-structured data, encompassing unstructured data that is annotated with metadata, like emails and tagged photos, as well as structured data that describes its own model, often in CSV, JSON, or markup languages as XML or HTML. To overcome the variability of their minimal structure, this kind of data is often stored in less strict NoSQL stores and complex data warehouses.
In contrast to traditional storage solutions, a Solid pod makes abstraction of the amount of structure in the data, and can therefore handle the whole range of unstructured, semi-structured and fully structured data. This makes a Solid pod not just a simple vault to store digital files, or a database to store data according to a fixed structure, but a unified storage experience that takes interoperability to a whole new level.
To achieve this, the Solid ecosystem builds on two pillars of the W3C open web standardization: Linked Data and RDF, the Resource Description Framework.
As we already mentioned in the previous episode, Solid data is structured as resources that are identified by HTTP URIs, which are Uniform Resource Identifiers like the hyperlinks used to surf the world wide web. In fact, any locatable document on the web has such a link to identify it. Linked Data, however, demands every data entity, concept and relation to be identified with such a URI as well. This way, the web of documents becomes an interlinked web of data, in which specific pieces of data can be referred to just as simple as we would refer to a website. This allows for easy access to data across multiple Solid pods, in different locations, without any knowledge of the underlying system.
Earlier we also pointed out how resources in a Solid ecosystem should be represented using the Resource Description Framework. RDF is a deceptively simple W3C standard that models data as statements about such resources. It allows us to describe the data using custom vocabularies and complex ontologies, making the data both human- and machine-readable. We therefore call such data "semantic"; in other words, the data becomes information.
The real power of RDF, however, comes from the ability to standardize these semantics as open data standards: by agreeing on publicly shared vocabularies and ontologies for specific conceptual domains, a wide range of systems and applications can publish, access and share data with each other.
Solid thus gives us an innovative data storage ecosystem that acts both as a data vault and a database. It is centered around web standards that aim towards a new level of interoperability between models, between applications, between systems, between organizations, by aligning how they describe data.