On the Road to the Semantic Web<title/> <style> p {color:#0000D1;} </style> </head> <body> <h1 style="color:#0000D1"> On the Road to the Semantic Web </h1><h3 style="color: #0000D1"> Troy Fleischauer, Autumn 2009</h3> <div style = "text-align:center"> <div style = "text-align:left; width: 900px; margin-left: auto; margin-right: auto; margin-top:50px; color: #0000D1;"> <p>The World Wide Web has been evolving since the web became mainstream during the 1990s. In the 90s, people of every demographic were coming out of the woodwork to access the Internet for the first time. Microsoft's Windows 95, released in 1995, was the best selling operating system of all time. The 56K dial-up modem provided a connection for millions of users for their first experiences on the World Wide Web. The combination of dial-up modems providing slow connection speeds, and what are known as ‘static web pages’ are the main components that made up what is <i> now </i> known as ‘Web 1.0’. </p> <p>Currently we are in the midst of a term <a href = "http://www.tothepoint.com/">Darcy DiNucci</a> coined, in 1999, as “Web 2.0.” Web 2.0 is a more interactive web and includes the implementation of social networking sites like <a href = "facebookimage.jpg"> facebook.com </a> and<a href = "myspaceimg.jpg"> myspace.com.</a> The page you are reading now is an example of a static page, it is not interactive instead it is strictly 'published' content. The limitation of static pages are that every user of a static page receives the same presentation. With sites like facebook and myspace, users can store, post, and exchange information and content such as pictures for others to view instantaneously. </p> <p>More recently, the terminology to describe the current and near-future web is <a href = "http://www.silicon.com/technology/networks/2007/10/18/wheres-the-real-web-20-39168873/">Real Web 2.0</a>, also known as Web 3.0. The idea behind the term '<i>Real</i> Web 2.0' is that many of the attributes of ‘Web 2.0’, termed ten years ago, have come or are finally<i> really </i>starting to come into fruition now. An identifying attribute of Real Web 2.0 besides the use of high-speed Internet connections is the effort to change formatting found on web pages and web sites to become an implementation known as the semantic web. The semantic web is a system that enhances Internet data searches by reading and reacting to ID tags embedded in every aspect of the data on the page. All text, pictures, hyperlinks, audio and video files, books, etc. will be identified by semantic web 'tags.' Each web page or web site that has its data embedded with semantic data (basically cross referenceable labels) will become a part of a higher order of connectivity. The semantic web adds logic to searches instead of the search returning an endless amount of sites and links for the user to search through. The idea will be that “the whim of a human being and the reasoning of a machine coexist in an ideal, powerful mixture.” (Berners-Lee) </p> <p>Semantic web searching is part of the vision of Tim Berners-Lee, who is known for implementing Hyper Text Markup Language and the World Wide Web in 1989. One could make a valid argument that the archetype of a semantic web was based on the idea of Vannevar Bush who in 1945 published a document called <i>As We May Think</i> that described using a personal data machine to implement finding and saving files. In theory, the documents would be able to find keywords in other documents. The machine would operate like a semantic text finder and saver. Bush called the machine<a href = "t0_story_memex.jpg"> The MEMEX </a>, named to invoke visions of both Memory and the Rolodex. The MEMEX used the<a href = "memex.jpg"> medium of microfilm </a>for storage and information retrieval. Documents were organized by a numbering system for retrieval and editing. <a href = "http://en.wikipedia.org/wiki/Memex">The MEMEX</a> was hypertext linking theory limited by the material items and technologies of the 1940s. Bush had a forward thinking concept but no way to effectively implement it. Although microfilm does have a dry process to develop, microfilm takes up real space (not a hard drive or server,) and would be a difficult medium to implement hyperlinking. It would be very difficult to efficiently cross reference semantic text data and manage the physical content on microfilm without current software technology. Bush, an engineer and scientist, is also known for being an organizer of the Manhattan Project.</p> <p> In 1960, Ted Nelson had a vision to create a hypertext word processor with his project called Xanadu. The Project Xanadu concept was a word processor that could create and display different versions of text documents. Nelson is better known for coining the phrase 'hypertext' in 1965. In 1982 Nelson coined the term 'transclusion' to describe hyperlinking pieces of electronic data in different documents to each other which is also the same concept as the semantic web. Furthermore, Nelson is also known for outlining the concept of the personal computer in 1978 to IBM who created their first personal computers three years later.</p> <p>In 1989 Berners-Lee applied the ideas of hyperlinking to his own creation Hyper Text Markup Language. Berners-Lee implemented the hypertext protocol with networks of computers and created The World Wide Web. As amazing as the World Wide Web is, pioneer Ted Nelson recognized shortcomings. Wikipedia states "Nelson claims some aspects of his vision are in the process of being fulfilled by Tim Berners-Lee's invention of the World Wide Web, but he dislikes the World Wide Web, XML and all embedded markup." Nelson regarded Berners-Lee's work as "a gross over-simplification of his original vision" and that "HTML is precisely what we were trying to PREVENT— ever-breaking links, links going outward only, quotes you can't follow to their origins, no version management, [and] no rights management." Perhaps Nelson's visions were so ideal that he did not implement them before other people made a lower ceiling version first. Project Xanadu is an example, the fully realized product itself was not released at a historically significant time. However Xerox PARC, with more capital behind them, did their own rewrite of Xanadu called <a href = " http://en.wikipedia.org/wiki/Smalltalk">Smalltalk</a> prior to 1992. Nelson released the source code for <a href = "http://en.wikipedia.org/wiki/Project_Xanadu">Xanadu </a>in 1998 in an attempt to overturn software patents.</p> <p>Tim Berners-Lee has called the term ‘Web 2.0’ simply “jargon.” The idea of the Internet and the World Wide Web linking open data is what all this Web <i>version x.x </i>jargon attempts to describe. Open data is data open to the general public, with all aspects of the data embedded with ID tags and made available to those searching for information. On-line data bases are current applications of open data. The semantic web is linking open data but the utilization of linking open data is still in its primitive form compared to its powerful capacity. Semantic web search tools have always been a part of Berners-Lee's vision, but the movement and the technology has only become more capable over the last ten years.</p> <p>A decomposition of ‘semantic’ web labeling can be compared to English grammar. Searches for labeled information are performed by searching for tags that have a relation to each other, comparable to the subject, predicate, and object in an English sentence. This grammatical analogy coincides with the term ‘triples.’ <a href = "http://faculty.washington.edu/tabrooks/SPARQL/watchThisSPARQL.htm">(Brooks)</a>. A sentence like: “Product ID 1 which is model ZX-6 has a quantity of 62 in stock” are words that have a logical grammatical relationship with each other A semantic retrieval process will logically process words in this same kind of relational way. The semantic web will understand synonyms for all these words as well. The sentence example above would change its state if, say, one of the items were sold: “Product ID 1 which is model ZX-6 has a quantity of <i>61</i> in stock.” Other label attributes can be comprehensive including tags like price and color. All these labeled or ID tagged items are also known as nodes. Nodes have a parent, child relationship. The embedded 'nodes' (or RDFa) need to be programmed in the web page during its creation. <a href = "http://en.wikipedia.org/wiki/RDFa">RDFa</a>, or Resource Description Framework attributes, are the data model that maps and embeds ‘triple’ information into XHTML documents.</p> <p>The process for users to access the semantic web is made possible by tools currently available to link open data, the web browser. The semantic data with terms like nodes, children, tags, meta-tags, and RDFa are all under the umbrella term microformat. Internetnews.com said in 2008 that “… browsers are working to include some measure of support for microformats -- a simple means of categorizing Web content as metadata.” Metadata is data about data.</p> <p>Current accessibility issues are abundant due to lack of development across the World Wide Web. Issues of compliancy to a standard are the case with different browsers. Browsers are and will continue to be programmed to access microformats in the future. In an acid test done at the UW by undergraduate students in 2009 showed that Firefox, Chrome, and Opera were tested to be more universally compliant to Web 2.0 standards than Internet Explorer. It may be true that history and bureaucracy play a role in web development. </p> <p>Modern pages use what is known as XML to code them (as opposed to older versions of HTML coding that static pages use.) XML is a stronger, more versatile, and more dynamic markup language. XML allows a page to update and change as the user interacts with the page. A web developer could easily change a theme of an XML page on a few lines. Those relatively simple changes could be set to affect multiple pages or a whole web site such as a seasonal color scheme. Currently with HTML, each page would need to be tediously altered every time a presentation style was to be implemented. Cascading Style Sheets can be used with HTML to separate the (text) content from the layout formatting, this was not typical in earlier versions of HTML. Still with the latest HTML 4.01 and Cascading Style Sheets presentations, the returned search result data is left up to humans to search through an abundance of links to decipher relevancy. The Semantic web attaches meaning to each word and computers can be programmed to reason and decipher each word. Search queries will look for semantic data and make logical associations such as recognizing and understanding synonyms, or analyzing and converting data in any language to the user’s language automatically.</p> <p>Generally speaking, most World Wide Web users are still not to the point where implementing semantic data is perpetually running in the background in every web application. <a href = "http://en.wikipedia.org/wiki/Ted_Nelson">Ted Nelson</a> describes that a user interface to be should be “so simple that a beginner in an emergency can understand it within ten seconds." At this time in 2009, linking open data requires a bit of background understanding to easily pull off. For now much of the semantic searching is done on databases such as <a href = "http://dbpedia.org/About">DBpedia</a>. However, for better or worse, the future is a web of machines interacting with each other to make logical associations and decisions.</p> <p style = width><b>References</b></p> <p>Berners-Lee, Tim, and Mark Fischetti. Weaving the Web : The Original Design and Ultimate Destiny of the World Wide Web by its Inventor. San Francisco: Harper San Francisco, 1999. Print.</p> <p>Berners-Lee, Tim. "Sir Tim Berners-Lee Talks with Talis about the Semantic Web." Casting Words. Talis, Birmingham: 7 Feb. 2008. Podcast.</p> <p>Brooks, T.A. (200X). "Watch this: Probe the Semantic Web with SPARQL" Information Research, XX(X) paper TBXXXXX.html [Available at http://InformationR.net/ir/XXXXXXXXXXXXXXX.html]</p> <p>Herman, Ivan. "W3C Semantic Web Activity." World Wide Web Consortium (W3C). N.p., n.d. Web. 7 Dec. 2009. <http://www.w3.org/2001/sw/>.</p> <p>Kerner, Sean. "Firefox 3: The Semantic Web Browser? - InternetNews.com." InternetNews Realtime News for IT Managers. N.p., n.d. Web. 7 Dec. 2009. <http://www.internetnews.com/dev-news/article.php/10792_3749861_1>. </p> <p>"Memex - Wikipedia, the free encyclopedia." Wikipedia, the free encyclopedia. N.p., n.d. Web. 7 Dec. 2009. <http://en.wikipedia.org/wiki/Memex>. </p> <p>Quocirca. "Where's the real web 2.0? | Networks | silicon.com." silicon.com | Technology Strategy for CIOs and Business Executives. N.p., n.d. Web. 11 Dec. 2009. <http://www.silicon.com/technology/networks/2007/10/18/wheres-the-real-web-20-39168873/>. </p> <p>"RDFa - Wikipedia, the free encyclopedia." Wikipedia, the free encyclopedia. N.p., n.d. Web. 11 Dec. 2009. <http://en.wikipedia.org/wiki/RDFa>. </p> <p>"Semantic publishing - Wikipedia, the free encyclopedia." Wikipedia, the free encyclopedia. N.p., n.d. Web. 7 Dec. 2009. <http://en.wikipedia.org/wiki/Semantic_publishing>. </p> <p>Steven, DeRoss. "Structured Information:." The CoverPages. N.p., n.d. Web. 7 Dec. 2009. <http://xml.coverpages.org/deroseStructure.html>. </p> <p>"Ted Nelson - Wikipedia, the free encyclopedia." Wikipedia, the free encyclopedia. N.p., n.d. Web. 7 Dec. 2009. <http://en.wikipedia.org/wiki/Ted_Nelson>. </p> <p>"Vannevar Bush - Wikipedia, the free encyclopedia." Wikipedia, the free encyclopedia. N.p., n.d. Web. 7 Dec. 2009. <http://en.wikipedia.org/wiki/Vannevar