On information overload

8 May 2007 | § 6 Patton

Today's WSJ (subscription for fulltext), in the Business Technology section, juxtaposes two pieces on the hypergrowth of digital information, discussing its reasons, its effects, and some responses to the growth.

The first piece is a straightforward and informative mini-whitepaper, entitled "Cutting Files Down to Size". It mentions the efforts of Chevron and Credit Suisse to control their information, primarily through implementation of new tools, new methods, and new employee work habits. There's something about an information base that's presently 1.2 petabytes in size, potentially growing by 57% per year, that can focus the minds of management. Add to that the million email messages per day that the 59,000 employee Chevron claims to process, and you're talking some serious data. So much data, perhaps, that's it's not even possible to glean the information from it. A veritable flood.

The primary solutions discussed are conceptually simple:

Get people to pay attention to the amount by which they're increasing the deluge
Put systems in place to eliminate redundancy, such as using Microsoft's well-reviewed SharePoint server, ensuring that even the worst PowerPoint slide decks are only stored once
Admitting, and getting data creators to admit, that not all information is equally valuable

All excellent steps, though both expensive and difficult to implement. Totally aside from the grotesque knock-on effects of continually increasing technology infrastructure to store all new information, the real benefit from such efforts is to remove potential sources of background noise; the unimportant, the duplicative, and the no-longer-operative. "Data" is both easy and not intrinsically valuable - it's "information" that's both difficult and valuable, and too much data can obscure the information. Best of luck to the contestants in slaying their particular dragons.

The companion piece, on the same page, in Lee Gomes' "Talking Tech" column was the more intriguing of the two. Entitled "Computers Should Be Taught To Let Certain Memories Go", it contained an interview with Harvard KSG professor Viktor Mayer-Schoenberger, and was among the more thought-provoking pieces in the entire day's paper.

Mr. Mayer-Schoenberger's thesis is this:

Human beings ... weren't designed to remember everything we ever learned, and sometimes are better off when we forget. Computers, he adds, should as a result be taught to let some memories go.
...
We are biologically hard-wired to selectively remember. But in moving into a digital age, we are now surrounding ourselves with tools that have inversed (sic) that.
...
How does this make life different?
In the predigital age, we might have called someone who knew a person we were interested in learning about, got them to tell us about the person. And we would get a quick picture -- but not a complete and comprehensive picture of each and every piece of communication or behavior that the person did over the past 20 years. I think we have lost something by moving from that sort of short encapsulation toward a complete picture that provides us with all the details, the sort that over time, we as a society, and as human beings, tend to forget.

But what's the problem with that?

Things that happened 10 or 15 years ago might have happened to a different person. Therefore, we should put less weight on what we did 15 years ago than we would do now. In the past, our brains did this automatically for us by forgetting it. But we haven't been able to develop another evolutionary method, another method by which we can weigh things that happened further in the past differently from those that happened more recently.

(ellipses mine)

Interesting theory, and one that makes some rational sense. I can't speak for anyone but myself on the subject, but I'm surely not the same person today as I was 15 years ago, and would want any judgment of me weighted more on the current me than the one from decades ago.

The (mild) shocker in the piece, however, was this, his prescription for a solution:

My proposal is that we have a law that mandates that software coders build into software a better ability for people to let their digital tools forget, if they so wish. Right now, both Windows as well as Mac OS have a huge amount of meta data that they keep track of for each file that we use: "Date Created," "Owner," and so on. So I suggest that we add another type of meta data: "Expiration Date."

Conceptually, he has a point - that would be at least a potential solution to the problem he's laid out. Why the rush to what I can guarantee would be massively ineffectual legal efforts, I wondered? For starters, I presumed it's because he's an associate professor with Harvard's Kennedy School of Government, whose faculty, perhaps by definition, thinks more abstractly and less rationally than, say, Harvard Business School's must. Then I visited his faculty page at KSG:

...He advises businesses, governments, and international organizations on regulatory and policy issues. He holds a bunch of law degrees, including one from Harvard, and an MS (Econ) from the London School of Economics.

That explains it. Ignoring any questions about how many law degrees one can effectively use, the "bunch" he holds appear to have been enough to outweigh any pragmatism learnt at LSE.

(also posted at issuesblog.com)

Posted by Patton

on Tue 8 May 2007 | § 6

§ 6 Comments

There’s something about an

There’s something about an information base that’s presently 1.2 petabytes in size, potentially growing by 57% per year, that can focus the minds of management.

One suspects that what specifically focused the management's attention was the cost of storage. Or at least such was the case at the day job recently.

Well, just add on more storage.
Okay. It will cost $x.
Hey, let's talk about cutting that down to size ..

9 May 2007 | 16:51 Brian Brian

Agreed on what focuses

Agreed on what focuses management's attention.

Sad, really, because I don't think that, in any absolute sense, the purchase price of new data storage capacity should be either #1 or #2 on the list for prioritization.

9 May 2007 | 19:41 Patton

Prioritizing by cost .. well

Prioritizing by cost .. well yes it's not the way it should be.

But there is the way things were and the way things are.

10 May 2007 | 00:32 Brian Brian

Perhaps I should be more

Perhaps I should be more clear: it's not really prioritizing by cost, so much filtering one's options based on cost.

The cost of the excess infrastructure to store data that may never be fully and properly utilized is among least of the problems. Being unable to actually turn the data into useful information is a far more insidious cost.

Or so I'd argue.

10 May 2007 | 00:36 Patton

eing unable to actually turn

eing unable to actually turn the data into useful information is a far more insidious cost.

I see your point.

10 May 2007 | 01:15 Brian Brian

Addendum - Having read Dr.

Addendum - Having read Dr. Mayer-Schönberger's paper, Beyond Copyright: Managing Information Rights with DRM, it's clear that his definition of the problem and the solutions that can mitigate it don't translate well into a "20 questions" format.

Digital rights management is, as the title indicates, at the heart of his thinking, and for anyone who puckers at the thought of DRM, his paper provides an excellent antidote to reflexive rejection of the concept.

I thank the professor for pointing me to the paper, and am better informed for having read it.

15 May 2007 | 08:32 Patton

[ You're too late, comments are closed ]