Google and Yahoo just announced that they’re going to support RDFa (and Microformats). Be sure that you understand what that means: This is the dawn of the Semantic Web (or you may also call it Web 3.0) and the beginning of the end of old ways of SEO (Search Engine Optimization). In this post you’ll find some more info explaining what RDFa is, why I state that the Semantic Web will now emerge and why it soon will not be important to put the buzzwords into Headline-Tags (<h1>Buzzwords here!</h1>) but to link the word your content is about to the rdf-Source of what that word really means.

Talking about Google. Google is THE defining webtechnology company. The google page rank is the most important factor that a SEO cares about, and he would be doing anything to get a high page rank, as this would rank his site higher in the google search. And google search is the main entry point of the average surfer to get to content he cares about. Being highly ranked in google’s search is like being well-known and this holds a lot of marketing power. Google (and the other search engines) are doing their best for about 20 years to understand a given HTML-site and its contents, to know what to show their users when they search for a keyword. But they can never know what you mean when you talk in your blogpost about Big Ben. Do you mean the Volcanic Massif on Heard Island labeled Big Ben? Do you mean the world champion jumping horse that held this name? In this article, I marked up these three words with their meaning … you can’t see it, but google can (well in fact that might be a stupid idea, because Google will think that this post might be about these items … but anyways, if you find it in the sourcecode, you’ll see what I mean, just in case this wordpress blog won’t have shreddered what I just typed in).Diagram for the LOD datasets

Most people that want to “invent” the semantic web like Qimaya think they can derive semantic meaning from webpages by emulating a human brain which just understands the words. They sell this idea to investors who don’t like technical terms like RDF or Ontologies, because those investors hope that the Web will become “semantic” by magic instead of hard-to-understand science. If that would be possible … don’t you think Google would have already implemented it? Nevermind … we all need some fantasy.

RDFa is a way of embedding RDF into HTML. RDF is the Resource Description Framework, and with that its possible to define semantic meaning. In RDF, you have a Subject, a Predicate and an Object as in real-world speech. You could say “This article (subject with the unique URI is about (standard-RDF-predicate) the semantic web “and Google would rank your article way higher when someone wants to know something about the Semantic Web and uses this keyword in Google Search. You could also model these triple-sentences to make a connection between defined resources. “I’m interested in the semantic web” could be a triple you can put directly into HTML. Google could derive a logical connection here.

Google will first use some use-cases, like that of the ratings. Say you define your blogpost as a rating about a product like some special laptop, and you define it unambigously by using a unique id (in fact a unique URI, perhaps that will be the product page at the laptops vendor), Google has a lot of info that it can directly parse from your website: Its about a certain laptop, you have rated it, and maybe you also give information about yourself. This is all machine-extractable structured data, that can be used by webspiders like Google Search. With these definitions about products, ratings, companies and people, a lot of the central data that a lot of people search about in Google can be automatically extracted from average users like you and me – if we know how to embed that data. This might become a central interest of all SEOs out there – understanding RDF and implementing RDFa into webpages might be the next thing in terms of “Semantic Page Rank“.

Its some kinda funny, that I’m working on an RDFa-Editor for a semantic blog in my thesis. Seems like I have to include Googles usecases in it, or it might be “outdated”. If you have any questions, just add a comment. By the way: Welcome to the Semantic Web!

  1. Google ist da noch lange nicht so weit, wie die gerne wollten. Selbst ihre Suchmaschine, macht eigentlich nicht dass, was sie vorgibt zu tun. Warum sich dublin core nicht, nur bedingt, durchsetzt verstehe ich z.B. auch nicht. Würde meine Arbeit wesentlich leichter machen. Das “semantic” web is auch so eine Sache, denn lustiger weise, funktioniert die deutsche Sprache anders als die englische, in Sachen Prädikat, von anderen Sprachen ganz zu schweigen (man denke da nur an über die hälfte der weltbevölkerung in Asien und dem indischen Subkontinent). Das semantic web, welches im moment entwickelt wird, geht aber nur vom Englischen aus. Daher muss erstmal wieder über multilinguale Klassifikationen gegangen werden, was wiederum heisst, dass der recall bei den Suchmaschinen erhöht wird, deine precision aber leiden wird, da du einfach zu viel Ballast erzeugst. Aber gut, dass sind mehr Probleme von meiner Seite und weniger von der Informatikseite.
    Interessant dabei ist natürlich, dass goggle angeblich jetzt diesen Schritt macht, wenn Wolfram mit seiner neuen Suchmaschine auf dem Markt auftaucht. Schauen wir mal was kommt…

  2. Das ist nicht ganz korrekt. Ich habe das nur mit tatsächlicher Sprache vergleichen, weil man es sich so leichter vorstellen kann, aber eigentlich handelt es sich um eindeutig identifizierte Ressourcen (Subjekt und Objekt), die über eindeutig identifizierte Prädikate verbunden sind. Wenn ich zum Beispiel sage, dass (Big Ben)(located_at)(London) … wobei die einzelnen Teile mit ihren repräsentierenden URIs ausgetauscht werden müssten … dann hat das nichts mit einer sprachlichen repräsentation zu tun. Die Mehrsprachlichkeit ist übrigens in dem System integriert. Freebase zum Beispiel hat das so gelöst, dass sie einen Bezeichnertyp für jede Sprache anbieten, man findet hier ( zum Beispiel die Namen des Big Ben in allen anderen Sprachen.

