TUNES vs the WWW
An essay which contrast the concepts of Metatext in Tunes with that of World Wide Web (WWW).WARNING! Very early draft. It's only a bunch of references, it needs work to make sense .. but anyway it is food for thought. This is not intended to be "politically correct" or even only "balanced" and it is my (MaD70) particular point of view, not necessarly endorsed by other members and contributors of TUNES, so beware. [Note that my English prose is horrible, feel free to improve it].
See also Tunes Distributed Publishing by fare and the concept of Metatext in Tunes.
The fundamental realization that The Web is broken (Completely Rethinking the Web), by Dirk Knemeyer (some thoughts on something similar to Metatext).
Some technologies predating the Web that W3C is trying to reinvent (badly):
§ courtesy of the Wayback Machine, see Search Facilities.
As an introduction, see this small article by Tim Bradshaw: The wheel of reinvention§ (old broken link).
From less formalized/near the user to more formalized theories and resulting technologies (I established this hierarchy because I think that implementing upper in terms of lower levels is convenient, it is a good separation of concerns):
- Knowledge Management/Artificial Intelligence (KM/AI)
See by Jorn Barger: "All marked up and nowhere to go" -- XML confronts the AI Problem§ (old link, now reachable again sometimes).
About Resource Description Framework (RDF) see:- Jean-Luc Delatre in this conversation: RDF is a monkey wrench?§ (old broken link):
There is no single "true" Ontology, not even finitely many of them because Ontologies are NOT models of the real world but steps in a process of organizing evolving consensus.
This conversation was on a blog by Danny Ayers. - By Fabian Pascal:
- On Relational Binary Database Design and..
- .. On Metadata, RDF and Relational Representation both with C. J. Date.
- More on XML and RDF.
- Jean-Luc Delatre in this conversation: RDF is a monkey wrench?§ (old broken link):
- [Groupware/Workflow Management? (links?)].
- Information Retrieval (IR)
See by Marcia J. Bates: After the Dot-Bomb: Getting Web Information Retrieval Right This Time (on the "ontology" fallacy and the effectiveness of specialized (vs conventional) dictionaries and thesauruses):[..] a good analogy is to say that faceted classification is to hierarchical classification as relational databases are to hierarchical databases. Most system designers would not dream of using hierarchical files these days, so why are hierarchical classifications of information content still being used?
- Data Management (Distributed DBMS)
- The big problem with XML and relatives, besides the stupid syntax [see below Erik Naggum on SGML syntax about inconveniences from a HCI perspective], is that they are no more intended only as an interchange format [which is anyway inappropriate even in this role, see below] but as a data model. Evidence of this fact: XML:DB Initiative for XML Databases (old broken link), W3C XML Query, XML Path Language (XPath) Version 1.0 and Version 2.0 (Draft), .. [moreover see below Rita Knox].
I refuse to consider seriously statements like this from XQuery 1.0: An XML Query Language -- W3C Working Draft 04 April 2005:[Definition: XQuery operates on the abstract, logical structure of an XML document, rather than its surface syntax. This logical structure, known as the data model, is defined in the.. ]
Consider that:- A data model is not a logical structure (they intend logical model, I think).
From Something To Call One's Own:- conceptual model (also referred to as business model or entity-relationship model): model of the persistent data of some particular enterprise.
- logical model: the logical representation in the database of some enterprise-specific conceptual model.
- physical model (or internal model): the physical representations on disk of some enterprise-specific logical model.
- data model: a general theory of data via which enterprise-specific conceptual models are mapped to logical models e.g. relational data model
- As Fabian pascal wrote in On What Is A Data Model: Reply To Simon Williams:
A succinct, precise definition of a data model is:
So, it is not clear in what sense XML has a data model (if at all); the XQuery 1.0 and XPath 2.0 Data Model -- W3C Working Draft 4 April 2005 certainly does not satisfy this definition.A general theory of data which defines structural (organization), integrity and manipulation features.
It is straightforward to specify these precisely for the relational data model:- Theory: predicate logic and set theory
- Structure: R-tables (precise definition!)
- Integrity: domain, column, table and database integrity
- Manipulation: R-operations (restrict, project, join, etc.)
PS: One of the big short-comings of hierarchical databases was that they basically offered only navigational APIs and were brittle towards physical optimizations and schema changes. XML is often compared to hierarchical databases (in the negative sense), but these people don't realize that declarative languages such as XQuery or XSLT and the ability to choose different physical designs and optimizations(including streams) without impacting the logical level make it much better than hierarchical databases and much closer to so-called non-first-normal-form (or NF2) models...
Some comments:- So, XQuery and XSLT are purported "declarative" languages; does this solves all the problems? Certainly not, see If You Liked SQL, You'll Love XQUERY, by Fabian Pascal.
Also, see ON JIM GRAY'S “CALL TO ARMS” by Lee Fesperman:XML and XQuery are insolubly flawed, the former being just another incarnation of the discredited hierarchical data model that RM [Relational data-Model] superseded. XQuery brings the excessive complexity that hierarchy instilled in the products that preceded SQL. A relational query language uses only one method to access and manipulate information—by data values. XQuery has at least three—by value, by navigating the hierarchy and through external linkage. This adds complexity, but no power, exactly the problem that Codd intended to eliminate with RM.
Emphasis mine. - About the "much closer to so-called non-first-normal-form": which first-normal-form (1NF)? XML documents are not in 1NF, as per relational data model, by definition, not being relations;
if this is the usual argument that relations cannot contain "complex objects" then this quote from ON OBJECTS AND RELATIONS, by C. Date, is pertinent:What I claim (and have claimed ever since about 1992, when I first realized that to talk in terms of this fuzzy "atomicity" concept was misleading and counterproductive) is that if A is some attribute of some relation R, then A can be defined in terms of absolutely any type (or domain, if you prefer) whatsoever. So you can have attributes whose values are integers, attributes whose values are strings, attributes whose values are polygons, attributes whose values are arrays, attributes whose values are relations, ... and on and on. Of course, it's crucial (as Fabian suggests) that we make a distinction between values of any given type, on the one hand, and the representation of those values under the covers, on the other -- but the idea that these two concepts should be kept rigidly apart isn't one that's peculiar to the relational world.
.. and this also, from DAWN WOLTHUIS’ “PROOF”First Normal Form (1NF): In his 1969 paper Codd allowed for domains that had relations as values, or what we refer to as relation-valued domains/attributes (RVD/RVA), “nested relations” for short. He considered RVDs the relational equivalent of “repeating groups” in hierarchic/network systems, which he deemed unnecessary complications. At the time Codd thought that RVDs would necessitate second- rather than first-order logic as a basis for the data language, which (a) is problematic, and (b) would complicate implementation of relational systems (see our forthcoming FOUNDATIONS paper for details). So in his 1970 paper he introduced the idea of eliminating nested relations through a process of normalization, for which he lists some benefits. The result is a collection of relations in their normal form, what we today call 1NF. Otherwise put, Codd’s initial position was that relations can be either defined over RVDs, or in 1NF (no RVD/RVA).
So, one can have a relation with attributes belonging to the XML domain (type), even if this is not very wise, in general (only if you are forced to deal with XML it makes sense).
As we explain in What First Normal Form Really Means, it later turned out that RVDs/RVAs can be supported within first-order logic and that, consequently, 1NF essentially means that at every intersection of a tuple and attribute there is exactly one value, which can be anything, including a relation. In this sense, a relation is by definition in 1NF (otherwise put: there is no such thing as an unnormalized relation). Note very carefully, however, that (a) values must be atomic with respect with the operators defined for the respective domains, whether they are relations or not (b) 1NF is only one requirement of a relation (c) there are integrity and manipulation components to the relational model.
[...]
What we also argue in our two papers is what Codd already foresaw in 1970: their later found compatibility with the model notwithstanding, RVAs add complexity, but no power (except perhaps some convenience in rare, specialized cases). Otherwise put, the model does not prohibit RVAs, but they are not a very good idea. - XML proponents are retracting the much hyped virtues of a pure textual format for all purposes (data management included), as noted below about XML binary format.
- A data model is not a logical structure (they intend logical model, I think).
- Eric Browne's paper The Myth of Self-Describing XML (.pdf) at Ocean Informatics - Document Downloads.
- Was XML Flawed from the Start?, by Carl Sassenrath, CTO of REBOL Technologies.
- Embedded Markup Considered Harmful by Ted Nelson. See also LMNL.
- An irreverent joke by Frank Atanassow The Essence of XML -- A modest proposal: don't underestimate its seriousness.
- See again by Fabian Pascal:
- On Database Design and XML.
- More on XML - "XML's data structure is hierarchic and unless data types, integrity and manipulation are added to it (to produce a full-fledged data model), just a bunch of tags are not sufficient for self-description [see below Rita Knox]. But if the three components are added -- which is what is happening in W3C -- the result will be the same nightmares that prompted us to rid ourselves of hierarchic DBMSs years ago.".
- XML Data Management (The "Future of DBMS"), Part 1.
- The XML Bug - "The hierarchic approach underlying XML does, in fact, have a formal foundation: graph theory. But as the XML 1.0 specification explicitly states [find how and where that is asserted], it does not adhere to to the theory. The reason is the same as that for which old hierarchic database management (e.g., IBM's IMS) eschewed the theory too: it is extremely complex. ";
[see below E. F. Codd]
[Note: investigate the use of category theory for data modelling (see Category Theory 101). How that is different from using graph theory? In particular the work by Zinovy Diskin and Boris Cadish (links to searches on CiteSeer)
Not relevant: they proposed to use arrow logic as a formal foundation for various graphical notations used for Relational DB schema specification (and other specification tasks) and category theory for Relational DB schema reconciliation (and translation to/from various notations).] .
- These two papers are examples of attempts to salvage XML although its shortcomings are every day more evident (see also below the XML binary format proposal):
- How Much Pain for XML's Gain? by Michael Champion.
- XML Parsing: A Threat to Database Performance (.pdf) by Matthias Nicola, Jasmi John:
Abstract: XML parsing is generally known to have poor performance characteristics relative to transactional database processing. Yet, its potentially fatal impact on overall database performance is being underestimated. We report real-word database applications where XML parsing performance is a key obstacle to a successful XML deployment. There is a considerable share of XML database applications which are prone to fail at an early and simple road block: XML parsing.
The "ingenuous" reader, at this point, could think that they suggested to discard XML. Of course not:We analyze XML parsing performance and quantify the extra overhead of DTD and schema validation. Comparison with relational database performance shows that the desired response times and transaction rates over XML data can not be achieved without major improvements in XML parsing technology. Thus, we identify research topics which are most promising for XML parser performance in database systems.
Amazing!
- Note also that OO is not the solution, see, for example, ON HARD TO TURN TABLES.
- The big problem with XML and relatives, besides the stupid syntax [see below Erik Naggum on SGML syntax about inconveniences from a HCI perspective], is that they are no more intended only as an interchange format [which is anyway inappropriate even in this role, see below] but as a data model. Evidence of this fact: XML:DB Initiative for XML Databases (old broken link), W3C XML Query, XML Path Language (XPath) Version 1.0 and Version 2.0 (Draft), .. [moreover see below Rita Knox].
By Tim Berners-Lee:
- this essay: Relational Databases on the Semantic Web, here he does not make much sense, but it's too big to comment [make a comment in another node?];
- in Press FAQ, I have this great new idea - changing the world he assesses his work in this way:
[..] (I didn't find lots of people willing to get excited about the idea of the web. They quite reasonably asked to know why it was different from the past, or other hypertext systems. In retrospect, it was mainly that the decentralized database is removed, allowing the system to scale, but allowing for dangling links. But it took a long time for that to surface as the novelty.) [..]
Here he transforms the weakness of a bad design in a virtue. How much better web servers vs Distributed DBM/IR Systems "scale" is debatable (euphemism).
Weakness of the Hypertext model: links!
[Review W3C XML Linking Language (XLink)].
We can stress the weakness of traditional links with various contrasts, different in terms but similar as concepts expressed, for example:
- explicit vs implicit (declarative/computable/lazy) associations/constraints
(in fact, at least for local/on site links, at least when there is a DBMS backend, they are usually generated and are there only for UI purpose; see, for example, e-commerce sites); - or associations by references/pointers vs associations by values (relational algebra/calculus);
- or address-based vs content-based (Robust Hyperlinks);
- or one way vs two (many) ways links;
- or the "Access Path Dependence" problem as E. F. Codd defined it in A Relational Model of Data for Large Shared Data Banks.
Essay by Tim Berners-Lee:
- Links and Law: Myths, in which he raises interesting points about the right to make reference to work of others (here he makes sense).
Weakness of HTTP (with which people are continually confronting):
[Review W3C HTTP - Hypertext Transfer Protocol.]- transaction management;
- session management; [HTTP is stateless, but see RFC2965 - HTTP State Management Mechanism based on cookies and RFC2964 - Use of HTTP State Management]
- security; [but see RFC 2817 - Upgrading to TLS within HTTP/1.1 and RFC 2818 - HTTP Over TLS). Contrast also with censorship-resistant solutions like Freenet]
- version management; [but see WebDAV Resources Web-based Distributed Authoring and Versioning: [..] a set of extensions to the HTTP protocol which allows users to collaboratively edit and manage files on remote web servers. ]
Winning the Application Server Arms Race: Using Smalltalk to Redefine Web Development keynote by Avi Bryant, at Smalltalk Solutions 2004 (also at James Robertson's blog StS 2004 - The Seaside keynote)
Abstract: It would be hard to imagine a worse model for user interface development than HTTP. Would you use a GUI framework where every event from every widget in your application was handed to you at once, periodically, as a large hashtable full of strings? Where every time a single piece of data changed you had to regenerate a textual description of the entire interface? Where the event-based architecture was so strict that you couldn't ever, under any circumstances, open a modal dialog box and wait for it to return an answer?Those are the costs of using the web browser as a client platform, and, by and large, we accept them. The dominant paradigms of web development -- CGI, Servlets, Server Pages -- do very little to hide or circumvent the low level realities of HTTP, and as a result, web applications are fragile, verbose, and ill-suited to reuse.
[..]
Google for "weakness of HTTP" gives only 7 results (4-jan-2003) .. amazing!
[Review REST (REpresentational State Transfer - RESTwiki)].
Weakness of Markup Languages:
A talk given by Aaron Crane: Does XML Suck? Or: Why XML is technologically terrible, but you have to use it anyway. On the same site The Big List of XML Technologies.
Why XML is awful§ (another copy and old broken link).
By Rita Knox: Here's What's Wrong With XML-Defined Standards - ".. Wasn't XML supposed to make data shareable? No. XML provides the tools to define shareable data models[sic], but it does not make them shareable any more than the alphabet makes every word in the English language understood by anyone who speaks English. .." [or the "naïve", with "self describing", imply that you don't need agreement on semantic and pragmatic issues before exchanging information.]
Fabian Pascal shows that XML is a poor choice for data transmission too:
See also XML Binary Characterization Working Group Public Page:The XML Binary Characterization Working Group is tasked with gathering information about uses cases where the overhead of generating, parsing, transmitting, storing, or accessing XML-based data may be deemed too great for a particular application, characterizing the properties that XML provides as well as those that are required by the use cases, and establishing objective, shared measurements to help judge whether XML 1.x and alternate (binary) encodings provide the required properties.As said in XML node, an implicit admission, by W3C, that the standard XML format is a useless waste of resources.
Erik Naggum, maintainer of the SGML Repository at the University of Oslo for nearly 6 years, before:
[??? where is it? it was a citation on the front page of an official SGML site ???]
.. and after the SGML cure: Erik Naggum on SGML and DSSSL, Arguments against SGML.
Also these threads on comp.lang.lisp: Core ideas behind SGML and XML and "Re: The Next Generation of Lisp Programmers":
- Message-ID: <3239282082503601@naggum.no> - "Furthermore, the more you think about things in SGML terms, the more you realize that fully explicit hierarchical structure is not enough. ",
-
Message-ID: <3239316488198320@naggum.no>, a psychological perspective about markup languages of the SGML family:
[..] I related my discovery that syntax matters when I had worked with SGML for half a decade. It is because of the stupid syntax of SGML that things turn bad when people try to use it. [..]
[..] In many important ways, SGML is its own worst enemy. It has managed to teach information structuring completely backwards. Instead of making the edges explicit but non-intrusive as in Common Lisp, the edges are much too visible and verbose with the stupid tags that very quickly consume more of the large volume of characters in SGML/XML documents than the contents. I am sure it was thought of as useful back when people needed to be converted, but once converted, it gets in the way more than anything else and leads people to make mistakes if they do not think very carefully about what they try to do. [..]
Constrast such verbosity with Syntax in K (again a psychological perspective).