Primary Data, Metadata, and the Topology of the Internet
April 19, 2008
I’ve used the terms “primary data” and “metadata” a few times, and I’m likely to continue doing so. Here are Floridi’s definitions:
Primary data – “These are what we ordinarily mean by, and perceive as, the principal data stored in the [database system], e.g. a simple array of numbers, or the contents of books in a library. They are the data an information management system is generally designed to convey to the user in the first place” (108).
Metadata – “These data are secondary indications about the nature of the primary data. They enable the [database management system] to fulfil its tasks by describing essential properties of the primary data, e.g. location, format, updating, availability, copyright restrictions, etc. In our example, they could be library records, or the page of an Internet search engine” (108).
Also relevant is the following claim: “Primary data need metadata in order to be manageable” (112).
Metadata is what makes a relational database “relational.” It serves to link information A with information B (by providing information C, which often says something as simple as “A and B are related.”) As such, we are all very familiar with metadata (even if we don’t use the term “metadata.”) Relational databases make heavy use of metadata to manages the messages in your gmail account, the blog posts on this and similar webpages, and the comments attached to various blog posts. Indeed, the fact those comments are “attached” is maintained by metadata that stipulates an attachment relation between a comment and a post.
And of course, the internet itself is filled with primary data that wouldn’t be navigable without metadata. It The blue links you follow in your web browser are, of course, implicit statements of relation between the link-text and another location on the internet.
Notice, incidentally, that when we talk about a system rich with metadata (like the internet), a topological way of speaking creeps into our voices. We begin using words like “location” and “navigate.”
Floridi sees this clearly:
The space of reason and meaning — including the narrative and symbolic space of human memory — is now externalised in the hypertextual infosphere, and this brings about four more consequences concerning the rhetoric of spatiality. (1) A linear narrative, which is necessarily associated with time, makes room for a multi-linear narrative that is naturally associated with space. In the past, writers constructed their narrative space virtually within the mind of the reader. Now writer and reader live in a common infosphere and the former no longer needs to weave the narrative, diachronically, within the mind of the latter, as an ongoing textual Web, since all signifieds can co-exist synchronically outside, in the public and intersubjective environment represented by the hypertextual infosphere. (2) In this public domain, writing and reading become spatial gestures, and if time still plays a role, this is only as far as the fictional time of narrative is replaced by the real time of information transmission and retrieval. (3) Consequently, a whole new vocabulary develops, one based on extensional concepts borrowed from the various sciences of space: cartography, geography, topology, architecture, set theory, geology and so forth. (4) It follows that logic, broadly understood as the science of timelessness, hence as intrinsically a topo-logy, tends to displace history, broadly understood as the science of the timed, i.e. a chrono-logy, and we have seen in the previous chapter how this radical topologisation of the infosphere may unfortunately lead to the paradox of a forgetful memory (130).
Citations:
Floridi, Luciano. Philosophy and Computing: An Introduction. London and New York: Routledge, 1999.