Hello, I am Lauro Vanderborght. This is episode five of Digita’s Tech Talks on Solid. In the previous episode, we discussed how HTTP is used as a network protocol to allow clients and servers to communicate about resources. In this episode, we will talk more about how to identify these resources, using URIs.
When you want to talk about someone or something --> when you are referring to a resource, you have to express exactly which resource you are referring to. You have to identify that resource. Identifiers are essential in any data information system, and the same holds true for the Solid ecosystem. Where typical database systems make use of local identifiers, resources in the Solid ecosystem need universal identifiers: identifiers that uniquely identify a specific resource, across any database system or location.
On the Web, we already use a subset of these kinds of identifiers, conveniently called Universal Resource Locators, or URLs. We can use these identifiers to also retrieve the resources they identify. In technical terms this is called dereferencing the URL. This means that each web page’s URL is also its Universal Resource Identifier, or URI, and can be used by anyone to refer to, and retrieve, that web page.
Each URI begins with a scheme name. That scheme describes how to interpret the URI, and how to access the resource if it is a URL. Within the Solid ecosystem, we will typically deal with URLs that use the HTTP scheme, or actually the more secure HTTPS scheme. The remainder of a URL describes where the URL is located (via the hostname), and where on that location you can find the document containing the resource (via the path and the path queries). Finally, you can use a fragment identifier if you are referring to a specific part of the document. The URIs you see now are all different URIs that could identify different resources.
<sidebar>
To be more complete, there also exist IRIs, or Internationalized Resource Identifiers. An IRI is a more advanced version of a URI. When using URIs, you can only use specific western characters. IRIs allow hostnames, paths, fragments, and path queries to contain pretty much all internationalized characters, including Chinese and Japanese characters. In practice, most systems support IRIs, and it’s easy to translate IRIs to URIs and back. In practice, this also means that you can locate resources using IRIs having internationalized characters, however, URI remains the most commonly used term.
The structured data we have been talking about can also contain identifiers that cannot be used to locate the resource on the Web → we could use non-dereferencable URIs instead of URLs. URNs, or Uniform Resource Names, such as ISBNs and ISSNs, are some concrete examples of these. </sidebar>
So, URIs are used to identify resources. Resources can be anything: documents served as web pages, and data blobs such as PDFs or images in JPEG. But the power of the Solid ecosystem is that resources can be data themselves. Using URIs, we move from the (original) Web of documents, to the Web of data -- also called the Semantic Web. By looking up a URI, people and applications can retrieve data about that resource, in both public and private networks. A resource could, for example, be a structured piece of data that describes me.
In the third episode of the Solid Tech Talks, we already talked about vocabularies: sets of words with an unambiguous, agreed-upon meaning. Each of those meanings are again resources, and thus all have a distinct URI. The resource that describes me could use these URIs to link my resource to attributes (my first name, my last name, my birth date), but also link my resource to other resources, all using URIs.
Since URIs are universal identifiers, it becomes very easy to reuse URIs, not only those of vocabularies, but also those used to structure a piece of data. This is how we grow our network of data in a decentralized way. If we want to say something about an existing resource, we can use an existing URI. If we want to say something about a resource that is not described anywhere, we can create a new URI, and other people can use this new URI when describing this new resource.
As we discussed in our previous episode, on the Web, a single URL could return different representations of the same resource. It could return an HTML view when users visit that resource, it could return a JSON object, an XML document, a JPEG image… If needed, you can completely cater your response to what is requested, using standard HTTP Content Negotiation.
In Solid, additional restrictions are made on URLs and URL usage based on the last 30 years of Semantic Web experience. For example: to avoid confusion, two URLs that only differ by ending slash should not refer to two different resources, and when a URL becomes unused, it should not be hi-jacked to describe a different resource.
Finally, it is nowhere specified that all URIs and the resources they refer to need to be publicly available. You can set up specific authorization schemes to only allow specific people or applications to access specific resources. How does that work? Well, that’s something for a next video.