Eric Miller and the Semantic Web

Courtesy of the University of Mary Washington Computer Science Department, I and an SRO crowd of students, faculty, and staff got to hear a fascinating talk today by Dr. Eric Miller (CSAIL, MIT). There’s no questioning the depth, intelligence, or intensity of his commitment to a data web; he made the best and most inclusive case for the semantic web I’ve yet heard. Inclusive, because his vision is not just about normalizing descriptive vocabularies or getting everyone firmly in the RDF camp. As I understand it, his vision is more about taking the data already implicit on the web and making it explicit, reusable, mashup-able. It’s a persuasive vision, one strongly reminiscent of Vannevar Bush’s and Douglas Engelbart’s desire to find a way to get ahead of our own information-generation and make good, timely use of the knowledge we have already discovered, knowledge that often languishes unread and unremarked because there’s no machine-readable associative trail to lead us there or to answer our queries comprehensively.

That said, and coming from a non-CS point of view, I do continue to have questions about the idea of the “semantic web,” particularly as it seems to me to downplay the semantic energies of the document in favor of the clearer and more specifiable semantic energies of data. My training in the humanities locates meaning in documents, at least in the sense that documents are the things that make the case for meaning, and invite a response to meaning. Data, by contrast, is measurement and observation. Crucial activities, to be sure–I want my physician’s decisions to be data-driven, make no mistake about it. And unlike many contemporary humanists, I do believe in fact, in empirical reality, and in our abilities to be in touch with it (though those abilities are problematic). And yet, I do believe that our documents, particularly our discursive and creative documents, are the things that make the data meaningful. Once I’m cured and hale and hearty, no set of data can ascribe meaning to that condition.

There’s a lot more to my questions, and a lot more to the argument, on both sides. I asked Dr. Miller to specify the distinction between “document” and “data,” and he replied “in the eye of the beholder.” He was being witty, of course, and later on we talked a bit and he admitted that the matter became “philosophical” when one looked into it closely. He invited me to email him with my question, and promised to respond with some links to resources discussing this distinction and its difficulties. He also insisted that his vision of the semantic web was not trying to isolate one vocabulary, but provide a framework for specifying identity, equivalence, and similarity in digital, physical, and conceptual resources. (He also said that the idea should never have been called “the semantic web.”)

The good news is that he very much supports the idea of document/data symbiosis as the web moves forward. The even better news was his advice to us all: don’t try to figure out what all this will be used for, he insisted, because doing so cripples innovation. Trying to specify all the outcomes and uses would have prevented the Web’s emerging at all, much less its fantastic proliferation. There’s a lesson there for the way we think about education as well.

Postscript: I was pleased that several of my Introduction to New Media Studies students were in the room for Dr. Miller’s talk. Their blogging is well underway and has already begun some wonderful exploration. You’ll find the aggregation blog (in its first iteration) at intronewmediastudies08.umwblogs.org. We’ll be building out this site during the term, but at least I’ve got a one-stop for the class’s blogging activity to date. Stop by and enjoy–and if you leave a comment, be sure to leave it on the student’s original post. You can get there by clicking on the author’s name at the bottom of the post.

6 thoughts on “Eric Miller and the Semantic Web

  1. More and more these days I am hearing about the fact that Web 3.0, whatever that is, will be more semantic web driven (or at least more SW friendly). I don’t necessarily have a problem with vocabularies and structure, and I understand they would make searching and finding resources online more rationale and orderly (and easier?). That said, your discussion here about the distinctly different energies of data and the document is an excellent way to imagine the real limitations of a “data driven” web.

    I often sympathize with Patrick’s need for an orderly vocabulary for tags, categories, a richer RSS feeds laden with information. But somehow wonder if the loose connections that now power some amazing serendipity on the wordlwideweb is preferable to a robust engine of metadata, the precision of which could be potentially dangerous and limiting to the interplay of information and ideas. Is the time spent dreaming that everyone will agree upon a common vocabulary reproducing the same idea of a universal scientific language? Where does cultural distinction come in, and how to we account for idioms and neologisms? Structured vocabularies and syntax frame a system, and if you’re like me you often understand that such system afford certain benefits but are ultimately driven towards control, in this case of langauge, information, data, etc.

    This certainly happens already, but how would agreed upon vocabularies begin to erase idioms and repress differences which often account for the magic of documents?–some in your field might call these anomalies poetry. Semantic systems trained on syntax and structure may help us organize our libraries, but how do they account for radically different organizational models that help us re-imagine the order of things? It reminds me of Brian’s discussion of Rick Prelinger’s library at the COSL conference, it is a system upon itself, and it may not be practical for finding what you want, but what you want may no longer be of interest when you are exposed to a different conception of order.

    Very helpful post that I needed right now as I wonder if my constant trepidation with the semantic web is fueled more by zeal than ideas, yet you crystallize the uncertainty beautifully here. Thanks

  2. “He also insisted that his vision of the semantic web was not trying to isolate one vocabulary, but provide a framework for specifying identity, equivalence, and similarity in digital, physical, and conceptual resources.”

    Exactly, I met Eric along with the rest of the SIMILE team in December, and was hugely impressed with the pragmatic approach they’re taking. Been writing about it ever since.

    http://blog.jonudell.net/2007/12/06/simile-semantic-web-mashups-for-the-rest-of-us/

    http://jonudell.net/talks/cusec/cusec.html

  3. I attended that talk as well and got something completely different out of it. I’m a Compsci major, so my interest was more in the technical side, however your analysis and the questions from the “other” side were intriguing for me. Having the technical ability to add semantic data to a website is interesting, but even more so is the way that data is used.

    My view of the data/document dichotomy is biased again through how I use it. Documents are created with data, templates are filled in, etc. I guess that’s why I’m a CompSci major. I agree that the document is special because of the things that only a human can add to it (perspective/meaning). I believe that we have to use the computers to add something else that they are efficient at doing (relating/indexing/connecting) and the semantic web is just another tool for us to gather data to then create meaning with.

    For a concrete example, my independent study this semester is trying to create an ‘events website’ for Fburg/UMW, and I’m trying to include semantic data in the design of it. I know how to technically do it, but the hard part is figuring how what data is important, and how to represent it in the most general/applicable way. The talk was interesting in that regard as well, because it spurred my thinking on the matter. It gave me more questions than answers, but I believe that’s a good sign.

    (Sorry my thoughts are a bit jumbled, I’m still trying to figure all this stuff out. )

  4. “Philosophical” notions of what a document is was particularly timely, since the DBpedia email list was discussing it yesterday. Here’s an excerpt:

    >> The meaning of “document” in this context is extremely broad; if we
    >> follow Otlet’s definition of a document as anything which can
    >> convey information to an observer(Buckland 1997), the term would
    >> seem to cover anything which can have a subject.
    >>
    >> By this standard, timbl is a document, but only when someone’s
    >> looking.

    Great digging into data/document. . . Here’s my response:
    http://www.patrickgmj.net/blog/thoughts-on-eric-millers-talk

  5. Pingback: The Transducer

  6. Gardner,
    Nice to see your comments, and to read those by erich h. (one of my students) and the blog entries of your students.

    Perhaps I have too simple a view on all this, but I don’t see an appreciable difference between data and documents.

    “One person’s data is another person’s program” is one my favorite quotes from Guy Steele; see http://research.sun.com/people/mybio.php?uid=25706 for info about him.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.