XML.com: Embedded Markup Considered Harmful

This link has been bookmarked by 9 people . It was first bookmarked on 28 Oct 2006, by Mike Wesch.

09 Sep 14

Greg Lloyd
Ted Nelson 2 Oct 1997: I want to discuss what I consider one of the worst mistakes of the current software world, embedded markup; which is, regrettably, the heart of such current standards as SGML and HTML. (There are many other embedded markup systems; an interesting one is RTF. But I will concentrate on the SGML-HTML theology because of its claims and fervor.)

TedNelson Xanadu XML SGML HTLM Web markup hypertext essay
- Embedded Markup
  I want to discuss what I consider one of the worst mistakes of the current software world, embedded markup; which is, regrettably, the heart of such current standards as SGML and HTML. (There are many other embedded markup systems; an interesting one is RTF. But I will concentrate on the SGML-HTML theology because of its claims and fervor.)
- There is no one reason this approach is wrong; I believe it is wrong in almost every respect. But I must be honest and acknowledge my objection as a serious paradigm conflict, or (if you will) religious conflict. In paradigm conflict and religious conflict, there can be no hope of doctrinal victory; the best we can seek is for both sides to understand each other fully and cordially.
- SGML's advocates expect, or wish to enforce, a universal linear representation of hierarchial structure.
- I believe that if this is a factual claim of appropriateness, it is a delusion; if it is an enforcement, it is an intolerable imposition which drastically curtails the representation of non-hierarchical media structure.
- I will turn to general problems of the embedded method. I have three extremely different objections to embedded markup. The first is simple; the second is complicated to explain; and the third challenges the claim of generality.
- Network electronic publishing offers a unique special-case solution to the copyright problem that has not been generally recognized. I call it transpublishing. Let me explain.
- In paper publishing, there are two copyright realms: a fortified zone of copyrighted material, defended by its owners and requiring prior negotiation by publishers for quotation and re-use; and an unfortified zone, the open sea of public domain, where anything may be quoted freely--but whose materials tend to be outdated and less desirable for re-use.
- Transpublishing makes possible a new realm between these two, where everything may be treated as boilerplate (as with public-domain material), but where publishers relinquish none of their rights and receive revenue exactly proportional to use.
- Two different parties have legitimate concerns. Original rightsholders are concerned for their territory of copyrighted material, as defined by law, so that they may maintain and benefit from their hard-won assets. But the public (everybody else, as well as rightsholders in their time off) would like to re-use and republish these materials in different ways.
- What if a system could exist which would satisfy all parties--copyright holders and those who would like to quote and republish? What if materials could be quoted without restriction, or size limit, by anyone, without red tape or negotiation--but all publishers would continue to furnish the downloaded copies, and would be exactly rewarded, being paid for each copy?
- Transpublishing versus Embedded Markup
- Embedded markup drastically interferes with transclusive re-use. For one thing, any arbitrary section of an HTML document may not have correct tags (since the tags overlap and extend over potentially long attribute fields). This means HTML-based transclusion cannot be handled by a simple tag, but probably requires some sort of proxy server.
- This is done all the time in scholarly writing and serious journalism, with phrases like "emphasis mine." It needs to be possible in transpublishing to change emphasis and other attributes by nullifying the original markup. Of course, re-emphasizing through markup is an editorial modification, subject to judgment calls and issues of academic etiquette. But the inquiring reader can always follow the bridge of transclusion to see the original as formatted by the author.
- Alternative method 1: parallel markup
  The best alternative is parallel markup. I believe that sequential formatted objects are best represented by a format in which the text and the markup are treated as separate parallel members, presumably (but not necessarily) in different files.[7]
- Alternative method 2: tag override
  Where it is inconvenient to break out the tags into a parallel stream--i.e., where they're already stuck or published in the original--we may fall back on the method of tag override. By this I mean simply treating the original tags as if they are not there; ignoring them while counting through the contents and furnishing instead a parallel tag stream, as in parallel markup. We do not dislodge the original markup, but simply ignore it.
- Exactly Representing Thought
  and Change
- My principal long-term concern is the exact representation of human thought, especially that thought put into words and writing. But the sequentiality of words and old-fashioned writing have until now compromised that representation, requiring authors to force sequence on their material, and curtail its interconnections. Designing editorial systems for exact and deep representation is therefore my objective.
- To find the support functions really needed for creative organization by authors and editors, we must understand the exact representation and presentation of human thought, and be able to track the continuities of structure and change.
- This means we must find a stable means of representing structure very different from the sequential and hierarchial--a representation of structure which recognizes the most anarchic and overlapping relations; and the location of identical and corresponding materials in different versions; which recognizes and maintains constancies of structure and data across successive versions, even as addresses of these materials become unpredictably fragmented by editing.
- Thus deep version management--knowing locations of shared materials to the byte level--is a vital problem to solve in the design of editing systems. And the same location management is necessary on a much broader scale to support transpublishing.
- Embedded markup cannot represent this at all, and merely adds obstacles (impeded data structure) to solving these rich addressing problems.
- Three Layers
- I believe we should find a very general representational system, a reference model which breaks apart in parallel what is represented by SGML and HTML. This would make the creation of deep editing and version management methods much easier. By handling contents, structure, and special effects separately in such a reference model, the parts can be better understood and worked on, and far more general structures can be represented.
- I would propose a three-layer model:[8]
- A content layer to facilitate editing, content linking, and transclusion management.
- A structure layer, declarable separately. Users should be able to specify entities, connections and co-presence logic, defined independently of appearance or size or contents; as well as overlay correspondence, links, transclusions, and "hoses" for movable content.
- Finally, a special-effects-and-primping layer should allow the declaration of ever-so-many fonts, format blocks, fanfares, and whizbangs, and their assignment to what's in the content and structure layers.
- I believe that a parallel system of this kind will soon become necessary because of the degree of entanglement and unmanageability of HTML. But we must learn from the recent past and provide sufficient abstractness and generality.
- Theodor Holm Nelson, designer and generalist, has been a software designer and theorist since 1960 and a software consultant since 1967. His principal design work includes Project Xanadu and xanalogical systems, the transcopyright system, and the theory of virtuality design. His industry positions include Harcourt Brace & World publishers, Creative Computing Magazine, Datapoint Corporation, and Autodesk, Inc.; his university positions include Vassar College, University of Illinois, Swarthmore College, Strathclyde University, and Keio University.
- Mr. Nelson has written several books, the most recent being The Future of Information (1997), as well as numerous articles, lectures, and presentations. He is best known for discovering the hypertext concept and for coining various words which have become popular, such as "hypertext," "hypermedia," "cybercrud," "softcopy," "electronic visualization," "dildonics," "technoid," "docuverse," and "transclusion."
- [2] Samuel Latt Epstein of Sensemedia, Inc. has pointed out (personal communication to the author) that he learned graphics programming on the Intecolor ISC-8001, ca. 1976, a machine that had a parallel data structure for its screen. 8K of memory was devoted to the text, 8K was devoted to the corresponding bytes of attribute memory. This "made it a snap" to program the screen, he says. The two parallel banks of memory could be manipulated independently, changing the colors without touching the text and vice versa, which greatly simplified (he says) programming both the text and the various graphical effects of those days.
29 more annotations...
17 May 14

Scott Donovan
memex ted nelson
23 May 07

zacchiro
markup overlap xml
- The SGML approach is a delivery format, not a working format. Editing is outside the paradigm, happens "elsewhere."
- Objection 2: Transpublishing a Potential Conflict
- though they must be edited in parallel.
- treating the original tags as if they are not there
- This means we must find a stable means of representing structure very different from the sequential and hierarchial--a representation of structure which recognizes the most anarchic and overlapping relations; and the location of identical and corresponding materials in different versions; which recognizes and maintains constancies of structure and data across successive versions, even as addresses of these materials become unpredictably fragmented by editing.
- three-layer model
4 more annotations...
28 Oct 06

Mike Wesch
hypertext nelson xml
- Embedded Markup Considered Harmful
13 Oct 06

eclectics
hypertext
20 Mar 06

David Naughton
Theodor Holm Nelson: 1997-10-02: XML.com

xml markup-languages
18 Dec 04

kavango
"SGML has been extended and extended to fill the universe, becoming a reference language of sequential attributes and now hypertext links and graphics (HTML). Its believers think SGML can represent anything at all--at least, anything they approve of."

metadata xml
24 Aug 04

David Corking
Firefox StuffToBlog Imported Bookmarks