CoreJava Important Classes Tricks | Tips | Java Linux IDE | Technology Updates

CoreJava Important Tricks | Tips | Java Linux IDE | Technology Updates | Download Free CoreJava Examples | Video Tuorials | Concepts of Classes

Sunday, February 15, 2009

Wikipedia and Britannica are Converging

Is there a network equivalent of market forces? The crowdsource equivalent (or perhaps generalisation) of Adam Smith's "Invisible Hand"?

When I was in my early teens my school got the new edition of the Encyclopaedia Britannica in over 30 volumes costing around £1,000. It was a huge wealth of authoritative information on just about everything. I spent quite a few lunch hours just reading about stuff.

In the 90s Microsoft bought out Encarta on CD-ROM. It was a lot less information, but it cost a lot less than £1,000. Britannica has been in trouble ever since. Now there is Wikipedia, which is free, a bit less authoritative than Britannica, but has even more information and a great deal more mindshare.

So Britannica has responded by taking a page out of Jimbo Wales's book; its going to start accepting user-generated content, albeit with a stiff editorial barrier to acceptance into the core Encyclopaedia.

Meanwhile the latest controversy on Wikipedia is about whether user-generated content needs more controls. I say "more" because Wikipedia has always had some limits; locked pages, banned editors, and in extreme cases even the deletion of changes (as opposed to reversion, which preserves them in the page history). Increasingly sophisticated editorial policies have been put in place, such as "No Original Research" (originally put in to stop crank science, but now an important guiding principle).

When you look at Wikipedia's history there is a clear trend; every so often Wikipedia reaches a threshold of size and importance where its current level of openness doesn't work. At this point Jimbo Wales adds a minimal level of editorial control. Locked pages, for instance, were added because certain pages attracted unmanageably high levels of vandalism.

It seems that Wikipedia and Britannica are actually converging on the same model from different ends of the spectrum: Wikipedia started out completely open and is having to impose controls to manage quality. Meanwhile the Britannica started out completely closed and is having to allow in user-generated content because comissioning experts to write it is too expensive. Somewhere in the middle is a happy medium that gets good quality content without having to charge subscribers.

Not charging is important. In these days of the hyperlink, an encyclopedia is not just a source of knowledge, it is a medium of communication. If I want to tell people that they should read about the history of the Britannica then I refer to the Wikipedia page because very few people who read this are going to be able to follow a link to a Britannica article (even if I were a subscriber, which I am not).

Wikipedia is often said to have been inspired by the "open source" model in that anyone can edit Wikipedia just as anyone can edit the Linux source code. In fact the cases are not parallel. The GPL allows me to download a copy of the Linux source code, hack it to my heart's content, and give copies of my new version to anyone who wants them. What it does not do is authorise me to upload my version to kernel.org and pass it off as the work of Linus Torvalds. Getting my changes into the official kernel means passing through a strict quality assurance process including peer review and extensive testing on multiple architectures.

So I think that this proposal to create "flagged revisions" for editorial review moves Wikipedia towards the open source model rather than away from it. Anyone will always be able to fork Wikipedia if they wish: the license guarantees it. But the offical version at wikipedia.org will earn increasing trust as the quality assurance improves, just as the official version of the Linux kernel is trusted because of its quality assurance.

We Don't Know How We Program

I was talking to a colleague from another part of the company a couple of weeks ago, and I mentioned the famous ten-to-one productivity variation between the best and worst programmers. He was surprised, so I sketched some graphs and added a few anecdotes. He then proposed a simple solution: "Obviously the programmers at the bottom end are using the wrong process, so send them on a course to teach them the right process."

My immediate response, I freely admit, was to open and shut my mouth a couple of times while trying to think of response more diplomatic than "How could anyone be so dumb as to suggest that?". But I have been mulling over that conversation, and I have come to the conclusion that the suggestion was not dumb at all. The problem lies not with my colleague's intelligence but in a simple fact. It is so basic that nobody in the software industry notices it, but nobody outside the industry knows it. The fact is this: there is no process for programming.

Software development abounds with processes of course: we have processes for requirements engineering, requirements management, configuration management, design review, code review, test design, test review, and on and on. Massive process documents are written. Huge diagrams are drawn with dozens of boxes to try to encompass the complexity of the process, and still they are gross oversimplifications of what needs to happen. And yet in every one of these processes and diagrams there is a box which basically says "write the code", and ought to be subtitled "(and here a miracle occurs)". Because the process underneath that box is very simple: read the problem, think hard until a solution occurs to you, and then write down the solution. That is all we really know about it.

To anyone who has written a significant piece of software this fact is so obvious that it seems to go without saying. We were taught to program by having small examples of code explained to us, and then we practiced producing similar examples. Over time the examples got larger and the concepts behind them more esoteric. Loops and arrays were introduced, then pointers, lists, trees, recursion, all the things you have to know to be a competent programmer. Like many developers I took a 3 year degree course in this stuff. But at no point during those three years did any lecturer actually tell me how to program. Like everyone else, I absorbed it through osmosis.

But to anyone outside the software world this seems very strange. Think about other important areas of human endeavor: driving a car, flying a plane, running a company, designing a house, teaching a child, curing a disease, selling insurance, fighting a lawsuit. In every case the core of the activity is well understood: it is written down, taught and learned. The process of learning the activity is repeatable: if you apply yourself sufficiently then you will get it. Aptitude consists mostly of having sufficient memory capacity and mental speed to learn the material and then execute it efficiently and reliably. Of course in all these fields there are differences in ability that transcend the mere application of process. But basic competence is generally within reach of anyone with a good memory and average mental agility. It is also true that motor skills such as swimming or steering a car take practice rather than book learning, but programming does not require any of those.

People outside the software industry assume, quite reasonably, that software is just like all the other professional skills; that we take a body of knowledge and apply it systematically to particular circumstances. It follows that variation in productivity and quality is a solvable problem, and that the solution lies in imposing uniformity. If a project is behind schedule then people need to be encouraged to crank through the process longer and faster. If quality is poor then either the process is defective or people are not following it properly. All of this is part of the job of process improvement, which is itself a professional skill that consists of systematically applying a body of knowledge to particular circumstances.

But if there is no process then you can't improve it. The whole machinery of process improvement loses traction and flails at thin air, like Wiley Coyote running off a cliff. So the next time someone in your organisation says something seemingly dumb about software process improvement, try explaining that software engineering has processes for everything except actually writing software.

Update: Some of the discussion here, and on Reddit and Hacker News is arguing that many other important activities are creative, such as architecture and graphic design. Note that I didn't actually mention "architecture" as a profession, I said "designing a house" (i.e. the next McMansion on the subdivision, not one of Frank Lloyd Wright's creations). People give architects and graphic designers room to be creative because social convention declares that their work needs it. The problem for software is that non-software-developers don't see anything creative about it.

The point of this post is not that software "ought" to be more creative or that architecture "ought" to be less. The point is that we need to change our rhetoric when explaining the problem. Declaring software to be creative looks to the rest of the world like a sort of "art envy", or else special pleading to be let off the hook for project overruns and unreliable software. Emphasising the lack of a foundational process helps demonstrate that software really does have something in common with the "creative" activities.

Followers