Grand Challenge: Memories for Life

The second review of Wetware-related Grand Challenge⚠ proposals discusses the Memories for Life project, proposed by Andrew Fitzgibbon with the Robotics Research Group at the University of Oxford and Ehud Reiter⚠ lecturer in Computing Science at the University of Aberdeen.

The Memories for Life project is subtitled “Managing information over a human lifetime”. It addresses people’s need for a unified system to store, manage and access the “ever-increasing amount of information about themselves, including emails, web browsing histories, digital images, and audio recordings” one amasses during the life.

Grand Challenge reviews

Here are the individual Wetware related Grand Challenge reviews:

In the early 1930, Vannevar Bush first wrote of a device he called the Memex, later famed in one of the most cited articles in hypertext research, “As We May Think“, published in Atlantic Monthly in 1945. According to the article, a Memex is “a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility”. In the article, Bush goes on to describe what would be similar to the links and nodes of today’s Web. Given the description of the device however, I reckon that what Bush was thinking about was actually more like something that the Memories for Life project is trying to achieve than it is like the World Wide Web. Hypertext-like methodology was just the medium he saw fit to display and organize the vast amount of data stored in the Memex. In the article he actually goes on to say “[The Memex] is an enlarged intimate supplement to [the individual’s] memory.”

With today’s communication technologies, we have certainly moved closer to the Memex, but the bits and pieces of data are very fragmented, making it a lot less valuable than it potentially could be. I can search my email inbox for a letter containing the word “memex”, but the results will leave out my encounters of the word on a website or in a telephone conversation, let alone in a book or a magazine that is not in the electronic format. Most likely however, the reason for my search is that I remember coming across the word, but don’t remember where and I am looking for that reference.

In a recent Wetware entry, I mentioned the information beam envisioned by David Gelernter in The Next Fifty Years. Real life projects aiming at similar results are Microsoft Research’s MyLifeBits and DARPA’s Lifelog.

Since these two research giants are working on projects similar to Memories for Life, one can assume that there might be something in it.

Memories for Life identifies several problems facing any project with similar ambitions. These problems can roughly be divided into three categories: 1) problems with long-time storage, 2) searching and effectively presenting vast amount of data in different formats, and 3) security/privacy issues.

Let’s look at each of these three problems:

1. Long-time storage

Storing data for a long time basically faces two challenges. Keeping the data itself and not loosing the ability to read it.

The problem with simply keeping the data is to maintain it in a medium that will last for a long time (we’re talking about storing data for up to 100 years or even more) or ensure that it is backed up and migrated often enough to survive even “once in a century disasters”. No current data storage method has been proven to work for this long time, simply because of the fact that we don’t have any 100 year old computer data. Some storage methods, however, have been proven to last no more than 5-10 years, e.g. the CD-ROM. Magnetic hard drives crash once in a while and tapes loose their storage capability over time. Long time data storage actually has its own Grand Challenge-like project: Time and Bits, by The Long Now Foundation.

The second problem is not quite as obvious, but no less important. Even though the data is securely stored, we also have to have the hardware and software that can read it and make sense of it.

There are still companies that are in the business of making punch card readers to be able to read data from this long-gone medium, and their business obviously comes from people that are trying to access this data. Data stored on 8, 5’ and even 3’ inch diskettes can be hard to retrieve, not because it is not really stored there any more, but because the equipment to read it is hard to find. Data and programs for nostalgic platforms such as Sinclair Spectrum, Commodore 64 and Atari is hard to read or run because examples of the platforms have become rare.

Even relatively new data can be hard to access because of proprietary file formats that some manufacturers use. I wonder for example how many people that own a Canon digital camera will be able to access their CRW formatted photos in 10 years time when the proprietary software to read it could be hard to find. Fortunately most camera manufacturers stick to more standard formats like JPEG that are less likely to cause trouble in the long run.

Some people have proposed as a solution to this, file formats that store not only the data but also information about how to read it. This would be a way to solve the problem, but unsuitable for most everyday use as it would require a lot of extra space to store even the smallest amount of data. A better approach for a project like MfL is probably sticking to the most common and standard file formats such as JPEG for images; MPEG for video; HTML, Word DOC or PDF for documents, MP3 for audio; XML for most structured data and so on. Other formats would be converted to these formats when encountered and stored in the standard formats.

2. Searching and presenting the data

The second main problem MfL identifies, is how to index all the data so that the vast amount of data in a variety of formats can be effectively searched.

As an example I repeatedly find myself searching for a bit of information I remember seeing somewhere recently, but can’t remember where. The possibilities include the Web, email, newspaper or magazine, television, radio and a conversation (face-to-face or on telephone), just to name a few. Can you imagine how convenient it would be to be able to search all these formats in one go? Even leaving out the “real world” data and searching only the digital data would be of great help already: “SEARCH everything I’ve read for the last two weeks FOR ‘car’”.

This actually shouldn’t be too hard to implement with today’s technology. Added functionality could include searching for synonyms and related words at the same time, e.g. a search for “car” would return results for “automobile” as well. Attempts at this are certainly being made with programs like e.g. Scopeware. Such programs could work very well for text, but then we have images, audio and video as well. Despite recent achievements in the audio field, automatic indexing of images and video is a task computers are quite far from reaching.

The MfL project proposal talks a lot about how to present and personalize the results to suit the users’ needs. I must say that just an effective solution to the search problem would be a great achievement and should be the number one focus. Development of general personalization software is on a fast track and the standard personalization solutions that will be available 15 years from now in combination with the search results envisioned in MfL should do the trick rather than spending a lot of effort on personalization within MfL.

The data in MfL’s stored memories will still be the best possible source for adjusting and configuring the personalization software. I think commercial solutions will have met the following criteria within 5-10 years, especially if they are fed with such exact personalization data:

For example, a short-term challenge could be to develop a model of a user’s literacy level by analysing examples of what he or she reads and writes, and linguistically simplify web pages based on this model; this would help the 20% of the UK population with poor literacy.

Personally I think something like:

Virtual Memories (10-year challenge): Create a virtual world that represents an incident from a person’s life, using stored memories of that incident. Reconcile and integrate memories of different modalities (eg, video and emails), and interpolate as necessary in space and time to fill in gaps. For example, a 3D birthday party, or an action replay of one’s greatest sporting moment.

… is going totally overboard and requires far more effort than the value of the end result.

3. Security and privacy issues

The third problem involves security and privacy. When so much of our “memories” are stored in a digital format, we must be careful who can access them and how.

the challenge of rigorously proving to a sceptical public that their memories are secure from hackers, amoral companies, and ‘Big Brother’ governments.

Other issues involve memory ownership:

“Should people who are included in another person’s memories (in a digital photograph, for example) have any control over how these memories are used?”
“Should courts or the police have the right to access memories that are relevant to a legal case or criminal investigation?”
One could also ask: Who owns the “memories” you acquire during working hours? Do they belong to your employer or does he at least have some claim to a joint ownership, e.g. if you are to leave the company?

These are all valid and real questions that must be addressed.

– – –

Problems 2 and 3 also face the time challenges put forward in problem 1, as both indexing and security solutions must still apply in 10, 20 or even 50 years. This probably requires a similar sort of migration methods as the storage itself, upgrading the indexes and encryption on a regular basis to meet the latest requirements.

Like the other Grand Challenges, Memories for Life is a highly thought provoking project. I think the emphasis is a bit too much on the personalization aspect of it, whereas I see the storage and indexing as the real challenges. Secondly I would have liked to see more about acquisition, storage and indexing of data that is currently not going through or stored on our computers but nevertheless make up our memories. I’m talking about our location as well as what we see and hear in the “real world” every day, which for most of us make up more of our memories than the digital experiences, but can be frustratingly hard to search using the biological means we’ve got preinstalled in our brains.

Links:

PowerPoint presentation of Memories for Life⚠ (OBS: 4 Mb)

Official Memories for Life links page

Haystack – MIT MfL-like project

PERMM⚠ – Personal Media Management project by AT&T

Stay in the loop