I was browsing the internet the other day1 and came across a link to the first website on the Web: http://info.cern.ch/hypertext/WWW/TheProject.html.
Go ahead, click it. Believe it or not, it still works perfectly in modern browsers a thousand years later!2
While admiring the site, I randomly clicked on project history. Something there caught my eye. A link to Tim Berners-Lee’s project proposal from 1989.
As a connoisseur of proposals, tech designs, and RFCs3, I was intrigued. After reading it, I have to tell you, it has some interesting stuff. Let’s take a look at some parts, in no particular order.
The people involved
The manpower required is 4 software engineers and a programmer.
Interesting that there’s a distinction between software engineer and programmer.
Under resources there’s a list of the specific people involved. It lists initials, and I’ve listed in italics who I think they are (based on the project’s people page).
System architect.
“Coordinate development, protocol definition, etc; ensures integrity of design. (50% TBL)” (Tim Berners-Lee).
“Market research and product planner. Discuss the project and its features with potential and actual users in all divisions. Prepare criteria for feature selection and development priority. (50% RC?)” (Robert Cailliau?)
Hyper-Librarian4. “Oversees the web of available data, ensuring its coherency. Interface with users, train users. Manages indexes and keyword systems. Manages data provided by the project itself. (100% KG?)” (not sure who KG refers to)
Software engineer: NeXTStep. 50% TBL (Tim Berners-Lee). The proposal doesn’t say who was supposed to do the other 50%.
Software engineer: X-windows and human interface. 75% RJ?
Software engineer: IBM mainframe. 50% RC (Robert Cailliau?)
Software engineer: Macintosh. 50% RC (Robert Cailliau?)
Software engineer: C. Help write code for dumb terminal or vt100 browsers, and portable browser code to be shared between browers. This could include a technical student project. (100% NP? + A.N.Other?) (Nicola Pellow?)
Budget
These will cost from 10 to 20k each, totalling 50k. In addition, we would like to use commercially available software as much as possible, and foresee an expense of 30k during development for one-user licences, visits to existing installations and consultancy.
So they had a total budget of 80K5. Seems like a pretty good investment. Especially given how much the internet is part of our everyday lives. Not to mention responsible for my career.
Hypertext and “the web”
HyperText is a way to link and access information of various kinds as a web of nodes in which the user can browse at will.
This is a pretty good description of the web!
This forming of a web of information nodes rather than a hierarchical tree or an ordered list is the basic concept behind HyperText.
and later
The network of links is called a web . The web need not be hierarchical, and therefore it is not necessary to "climb up a tree" all the way again before you can go down to a different but related subject.
The idea that the web is not hierarchical is important in the proposal. Seems obvious now, but I wonder if this we a key part that no one had figured out before.
Back in the introduction there’s this:
At CERN, a variety of data is already available: reports, experiment data, personnel data, electronic mail address lists, computer documentation, experiment documentation, and many other sets of data are spinning around on computer discs continuously. It is however impossible to "jump" from one set to another in an automatic way: once you found out that the name of Joe Bloggs is listed in an incomplete description of some on-line software, it is not straightforward to find his current electronic mail address. Usually, you will have to use a different lookup-method on a different computer with a different user interface. Once you have located information, it is hard to keep a link to it or to make a private note about it that you will later be able to find quickly.
Makes sense. There was a lot of useful information sitting around, but it was all on different machines, in different systems, acessed in different ways, and not linked together.
Tim Berners-Lee saw a better way of organizing and accessing that info. Not only that, but he also saw the global implications of the system he was building.
Markup
In the project non-goals, there’s:
This project will not aim … to force users to use any particular word processor, or mark-up format.
And under “Operation”:
Once the server has located the requested node, it will know from the node contents what the node's format is (eg. pure ASCII, marked-up, word processor storage and which word processor etc.).
What’s interesting to me about this is it seems that there could be many different formats of markup, and browsers may only be able to display certain types.
This is very different from where we’re at today, where there’s one format: HTML6.
Future paths
The proposal forsees some future work that would end up being very relevant.
Daemon programs which run overnight and build indexes of available information.
Sounds like search engines.
A server automatically providing a hypertext view of a (for example Oracle) database, from a description of the database and a description (for example in SQL) of the view required.
Sounds like web apps.
A serious study of the use and abuse of the system, the sociology of its use at CERN.
I’d love to see a study of the use and abuse of the system at CERN. We all know it would go on to be used and abused by us all.
Being completely productive, I swear.
Actually the proposal was proposed in 1989, so it was 33 years or so ago at the time of this writing.
Request for comments
This role sounds awesome.
The proposal doesn’t say what currency this is in. Probably Swiss Francs?
Of course you can author content in many different ways: HTML, PHP, React apps, Elm, htmx, and many more. But web servers send HTML to all clients..
The early WWW has some interesting browsers to play with, e.g. Amaya (https://www.w3.org/Amaya/) that had way more edition options than modern Browsers.
At the time TBL was working on the WWW proposal SGML/DSSSL was a pretty big item to rival HTML for markup. You also have a lot of other markup in common use in the target audience of physics at CERN, for example LaTeX, which is still very common for the superior math support. So a lot of the existing documents might have been TeX. Even markup like groff wasn't unheard of. So keeping open for multiple markup formats was politically wise. This was the time before even PDF existed, so at best you might have Postscript or EPS or the source markup.
The modern web is not all HTML either, you have CSS, the universe of Javascript, all the embedded XML sub languages like SVG, MathML and others. They have just been absorbed under the ever changing HTML label.
In some parts, WWW was actually worse than the already existing Gopher/Veronica hypertext and search engine system, as the search stuff was an afterthought here.
"The idea that the web is not hierarchical is important in the proposal. Seems obvious now, but I wonder if this we a key part that no one had figured out before."
Not really, the fundamental idea of HyperText as a non-hierarchical "web" of information predates this by many years, and is probably why TBL refers to it so frequently in the proposal.