english

Massively Multiplayer game as a work place

More than two years ago, I wrote a post titled When People are Cheaper than Technology. The basic premise there was that the cheapest and best solution to many problems we are trying to solve by building software systems could be to make people part of those systems – basicly what Amazon is now trying to do with the Mechanical Turk.

I also wrote another one on what I called Games With a Cause – trying to make simple, fun games that whose use would result in data with some research or even monetary value. I even made one, with limited success 🙂

But earlier this week, in yet another nutty conversation with my friend Haukur, we started discussing the incredible amount of time that people all over the world are spending, playing Massively Multiplayer games (MMORPGs). What if just a tiny portion of this playing time could be turned into some “real work”?

The numbers are staggering:

  • A MMORPG game player will spend on average between 12 and 21 hours per week playing their favorite game. (source)
  • Ultima Online players spent more than 160 million man hours playing the game per year around the year 2000
  • World of Warcraft recently reached 5 million subscribers. If the 12 hour number from above holds true that’s more than 3.1 billion man-hours in a year.

A typical man-year (must they be called person-hours to be politicaly correct?) has about 1600-2000 man-hours in western coutries, so Blizzard (the maker of World of Warcraft) is controlling what equals a work force of 1.5 million people! Even if you could only turn a miniscule amount of this time into monetizable work…

And what could that work be? Most of these games have advanced trading systems. Maybe some financial simulations could be run in the game world, if I remember correctly the Nobel prize was awarded a couple of years ago for work in experimental economics. The MMORPGs could be ideal testbeds. Or maybe “stealing” a few computing cycles in the vast grid of connected computers. Or maybe somehow building tasks like image recognition (a la Amazon’s Mechanical Turk) into the gameplay. Or…

My hunch is that simulations of how markets and societies respond to different settings, rules and events would be a good bet.

Imagine – if you could make Blizzard a 0.30$ per hour, that would amount to about 15$ a month, which is incidentally the same as the month subscription – and you could play for free!

New blog

I’ve started a new blog directly under hjalli.com

It will be a bit more general and there will be posts in both English and Icelandic as suits each subject.

It’s still very fresh and immature, but I hope to post some interesting thoughts there on Spurl.net, web search and the internet as well as some “old-fashioned” wetware stuff. The Icelandic posts will probably be more general web and technology discussions as well as some personal stuff.

Hope to see you there…

Spurl launches an Icelandic search engine

Originally posted on the Spurl.net forums on November 1st 2005

Spurl.net just launched an Icelandic search engine with the leading local portal / news site mbl.is. Although Iceland is small, mbl.is is a nice, mid-size portal with some 210,000 unique visitors per week – so this has significant exposure.

The search engine is called Embla – it’s a pun on the portal’s name (e mbl a) and the name of the first woman created by the Gods in Norse Mythology.

An Icelandic search engine may seem a little off topic for many of you Spurl.net users, but it is relevant for several reasons:

  • We have been busy for some 4 months now, building the next version of Zniff. Embla uses this new code and this code will make its way into both Spurl.net (for searching your own spurls) and Zniff (for searching the rest of the Web) in the next weeks. The engine is now reliable, scalable, redundant and compliant with a lot of other .com buzzwords – with more relevant search results and subsecond response times on most queries.
  • This commercial arrangement with mbl.is (hopefully the first of many similar) gives us nice financial footing for further development of Spurl.net and related products.
  • Embla uses information from Spurl.net users. Even though only a small portion of our users are Icelandic, we’re using their information as one of the core elements for ranking the search results – and with good outcome. This strengthens our believe in the “human search engine” concept for other markets as well as in the international playing ground.

I plan to write more on the search engine itself on my blog soon, but just wanted to mention briefly that with Embla we’re also breaking new ground in another territory: Embla “knows” Icelandic. Most search engines and technologies are of English origin, and from a search standpoint, English is a very simple language. Most English words have only a couple of word forms (such as “house” and houses”) and while some search engines use stemming (at least sometimes), it doesn’t matter all that much for English. Many languages – including Icelandic – are far more complicated. Some words can – hypothetically – have more than 100 different word forms. In reality a noun will commonly have about 12-16 unique word forms. Now THIS really matters. The difference in the number of returned results is sometimes 6 to 10-fold, and it improves relevance as well.

We have built Embla so that it searches for all forms of the user’s search words. We also offer spelling corrections for Icelandic words, based on the same lexicon. The data for this comes from the Institute of Lexicography, University of Iceland, but the methodology and technology is ours all the way and can (and will) be used for other languages as well. Quite cool stuff actually – as mentioned, I will write more about that on the blog soon.

Using AJAX to track user behavior

Here’s a thought. With the rise of AJAX applications this is bound to happen, and may very well have been implemented somewhere, even though a quick search didn’t reveal a lot. So here goes:

By adding a few clever Javascript events to a web page, it is possible to track user behavior on web pages far more than with the typical methods.

Similar methods have for a long time been used on some sites (including my own Spurl.net) to track clicks on links. A very simple script could also log how far down a page a user scrolls, over which elements he hovers the mouse and even infer how long he spends looking at different parts of the page and probably several other things usability experts and web publishers would kill (or at least seriously injure) to learn.

Even though it’s nowhere near as precise as the famous eyetracking heatmaps (here’s the one for Google’s result pages), it provides a cheap method for web publishers to help better position ads and user interface elements, experiment with different layouts, etc.

Does anybody know of implementations like this? Does it raise some new privacy concerns?

P.S.: I just saw that Jep Castelein of Backbase posted some related thoughts this summer.

Google and user driven search indexes

Needless to say, I am a firm believer that the next big steps in web search will come from involving the users more in the ranking and indexing process.

The best search engines on the web have always been built around human information. In the early days, Yahoo! was the king, first based on Jerry Yang’s bookmark collection and later on having herds of human editors categorizing the web building on top of Jerry’s collection. The web’s exponential growth soon blew this model and for a year or so, the best ways to search the web involved trying to find one or two useful results in full pages of porn and casino links on Altavista or HotBot.

Then Google came along and changed everything with their genius use of the PageRank algorithm, assuming that a link from one web page to another was a vote for the quality and relevance of the content of the linked page. You know the story. Today, all the big engines use variations of this very same core methodology, obviously with a lot of other things attached to fight spam and otherwise improve the rankings.

Next steps: From page content to user behavior

But the next step is due and the big guys in the search world are really starting to realize this. A lot of the recent innovations coming from the search engines – especially from Google – are targeted directly at gathering more data about user behavior that they can in turn use to improve web search.

Among the behavior information that could be used for such improvements are:

  • Bookmarks: A bookmark is probably the most direct way a user can say “I think this page is of importance”, additionally users usually categorize things that they bookmark, and some browsers and services – such as my very own Spurl.net – make the bookmarks more useful by allowing users to add various kinds of meta data about the pages they are bookmarking. More on the use of that in my recent “Coming to terms with tags” post.
  • History: What pages has the user visited? The mere fact that a user visited a page does not say anything about it other than that the page owner somehow managed to lure the user there (usually with good intentions). The fact that a page is frequently visited (or even visited at all) does however tell the search engine that this page is worth visiting to index. This ensures that fresh and new content gets discovered quickly and that a successful black hat SEO tricks (in other words “search engine spam”) don’t go around for long without getting noticed.
  • Navigational behavior: Things such as:
    • which links users click
    • which links are never clicked
    • how far down do users scroll on this page
    • how long does a user stay on a page
    • do users mainly navigate within the site or do the visit external links
    • etc.

All of these things help the search engine determining the importance, quality and freshness of the page or even of individual parts of the page.

All of these things and a lot more are mentioned in one way or another in Google’s recent patent application. Check this great analysis of the patent for details. Here’s a snip about user data:

  • traffic to a document is recorded and monitored for changes (possibly through toolbar, or desktop searches of cache and history files) (section 34, 35)
  • User behavior is websites are monitored and recorded for changes (click through back button etc)(section 36, 37)
  • User behavior is monitored through bookmarks, cache, favorites, and temp files (possibly through google toolbar or desktop search) (section 46)
  • Bookmarks and favorites are monitored for both additions and deletions (section 0114, 0115)
  • User behavior for documents are monitored for trends changes (section 47)
  • The time a user spends on website may be used to indicate a documents quality of freshness (section 0094)

Google’s ways of gathering data

And how exactly will Google gather this data? Simple: by giving users useful tools such as the Toolbar, Deskbar and Desktop Search that improves their online experience and as a side effect provides Google with useful information such as the above.

I’ve dug through the privacy policies for these tools (Toolbar, Desktop Search, Deskbar) and it’s not easy to see exactly what data is sent to Google in all cases. The Toolbar does send the them the URL of all visited pages to provide the PageRank and a few other services. It is opt-out, but default on, so they have the browser history of almost every user that has the toolbar set up. Other than this, details are vague. Here are two snips from the Desktop search Privacy FAQ:

“So that we can continuously improve Google Desktop Search, the application sends non-personal information about things like the application’s performance and reliability to Google. You can choose to turn this feature off in your Desktop Preferences.”

“If you send us non-personal information about your Google Desktop Search use, we may be able to make Google services work better by associating this information with other Google services you use and vice versa. This association is based on your Google cookie. You can opt out of sending such non-personal information to Google during the installation process or from the application preferences at any time.”

It sounds like error and crash reporting, but it does not rule out that other things, like bookmarks, time spent on pages, etc. are sent also. None of the above mentioned products provides a finite, detailed list of things that are sent to Google.

Next up: Google delivering the Web

And then came the Web Accelerator. Others have written about the privacy, security and sometimes actually damaging behavior of the Web Accelerator, but as this blog post points out, the real reason they are providing it is to monitor online behavior by becoming a proxy service for as many internet users as possible.

I’m actually a bit surprised with the web accelerator for several reasons. Google is historically known for excellent quality of their products, but this one seems to have been publicly released way too early judging from the privacy, security and bug reports coming from all over the place. Secondly, the value is questionable. I’m all for saving time, but if a couple of seconds per hour open up a range of privacy and security issues I’d rather just live a few days shorter (if I save 2 seconds every hour for the next 50 years that amounts to 10 days) 🙂

The third and biggest surprise come from the fact that I thought Google had already bought into this kind of information with their fiber investments and the aqcuisition of Urchin. Yes, I’m pretty sure that the fiber investment was made to be able to monitor web traffic over their fat pipes for the reasons detailed above – even as a “dumb” bandwidth provider. Similar things are already done by companies such as WebSideStory and Hitwise that buy traffic information from big ISPs.

Urchin’s statistics tools can be used for the same purpose in various ways – I’m pretty sure Google will find a way to make it worth an administrator’s trouble to share his Urchin reports with Google for some value-add. So why bother with the Web Accelerator proxy play?

Google already knows the content of every web page on the visible web, now they want to know how we’re all using it.

Good intentions, bad communication

Don’t get me wrong. I’m sure that the gathering of all these data will dramatically improve web search and I marvel the clever technologies and methods in use. What bothers me a bit is that Google is not coming clear about how they will use all this data, or even the fact that they are and to what extent they are gathering it.

I can see only two possible reasons for this lack of communication. The first one being that they don’t want to tell their competitors what they are doing. This can’t be the case, Yahoo, MSN and Ask Jeeves surely have whole divisions that do nothing but reverse engineer Google products, analyze their patent filings and so on. That’s just naive. The second reason I can think of is that they are afraid that users will not be just as willing to share this information if it becomes so clearly visible how massive their data gathering efforts have become.

I’m not an extremely privacy concerned person myself, but I respect, understand and support the opinions of those that are. The amount of user data that is gathered will gradually cause more people to think what would happen if Google turned evil.

– – –

Hjalmar Gislason is the founder of Spurl.net. The above reflects some of the things his company is doing with the Spurl.net bookmarking system and the Zniff search engine.

Coming to terms with tags: folksonomies, tagging systems and human information

Over the past few years we’ve seen a big movement from hierarchical categories to flat search. Web navigation and email offer prime examples: Yahoo’s Directories gave way for Google’s search, Outlook’s folders are giving way for the search based Gmail. It’s far more efficient to come up with and type in a few relevant terms for the page or subject you’re looking for than it is to navigate a hierarchy. A hierarchy that may even be built by somebody else, somebody that probably has a different mindset, and therefore categorizes things differently than you would.

Lately, tags have become the hot topic when discussing information organization (or – some might say – the lack thereof). But tagging and flat search are really just two sides of the same coin.

The main reason tagging is so useful is that it resembles and improves search. It assigns relevant terms to web pages, images or email messages, making it easier to find them by typing in one or more of those relevant terms or selecting them from a list. Furthermore, tag based system assign YOUR terms to a resource – based on your mindset – not somebody else’s, making them extra useful for managing one’s own information. The more terms you assign to the resource initially, the more likely it is that one of them comes to mind when you’re looking it up again later on.

But you also want to find new information, not only what you’ve already found or seen. And again, tagging comes to the rescue. If somebody else has found a term relevant for a page, it’s likely that so will you. If that person’s mindset (read: use of tags, range of interests and other behavior) resembles yours, the more likely it is to be of use to you. Furthermore, if a lot of people agree on assigning the same tag to a resource, the more relevant we can assume it to be for that resource.

Tags don’t tell the whole story

Tags are wonderful, but they won’t cure cancer all by themselves.

When tagging a resource – especially in these early days of tagging systems – it is quite likely that you are the first person to tag it. Think as hard as you can and you will still not come up with all the terms that you nevertheless would agree are highly relevant to the subject – terms that you might even think of later when you want to look it up again. It is therefore important to use other methods as well to help assigning relevant terms to the resource, when possible.

Firstly, there are other ways than tagging to help users to identify relevant terms. By allowing them to highlight the snips from the text that they find most interesting, you can give extra weight to that text. Descriptions of the resource, written in a free format are another way. Here you can see an example of a page that has been highlighted using tags, snips and other information from Spurl.net users (another example here).

Secondly you have the author’s own information and that should (usually) account for something. In the case of a web page you have the page text and its markup. The markup identifies things that the page author found to be important, among them:

  • The page title
  • Headings
  • Bolded and otherwise emphasized text
  • Meta keywords, descriptions and other meta information
  • Things near the top are usually more relevant than at the bottom

Analyzing all this text information (both the user’s and the author’s) in relation to expected word frequency identifies more relevant terms than the tags alone.

And there are even more ways to get relevant terms, link analysis (words that appear in and around links to the document and the relevancy of the linking documents), etc. Basically all the methods that Google-like engines are already using to index pages using spiders and other automated methods.

Relevant terms are a consensus

That said, relevant terms for a resource must be seen as a consensus. A consensus between the user himself, other users, the page author and the “rest of the web”.

If all of these players are roughly in agreement, all is well and the term relevancy can be trusted. If the information from the page’s author is in no agreement with information from the users, it is likely that the author is trying to trick somebody – most likely an innocent search engine bot that wanders by – and the author’s information should be given less weight than otherwise.

When searching for something, you don’t want to end up empty handed just because nobody had thought of tagging a resource with the terms that you entered. The relevant terms must be seen as layers of information, where your tags and other terms that you have assigned play a central role, then comes information from other users – giving extra weight to those users that are “similar” to you (based on comparing previous behavior), then the page author and finally the “rest of the web”.

Trust systems

In addition, you need trust systems to prevent the search from being “gamed”. Trust should be assigned to all players in the chain: users, authors and linking web pages. If a long-time, consistent and otherwise well-behaving user assigns a tag to a page, we automatically give it high weight. If a new user includes a lot of irrelevant information on his first post, his trust is ruined and will take a long time to be regained.

Users gradually build trust within the system and if they start to misbehave, it affects everything that they’ve done before. Same goes for web page authors and linking pages. If your information is in no way in line with the rest of the consensus, your information is simply ignored. Providing relevant information is the only way to improve your ranking. (more in this post on Spurl.net’s user forums).

On the author layer, this could even resurrect the long dead keyword and description meta-tags, that are – after all – a good idea, if it hadn’t been so blatantly misused.

Coming to terms with tags

The “folksonomy” discussion, so far, has been largely centered on user assigned tags and tagging systems, but these are just a part of a larger thing.

Relevant terms – as explained above – are really what we are after, and it just so happens that tags are the most direct way to assign new relevant terms to a resource. But they are far from the only one. Tagging is not “the new search”; it is one of many ways to improve current search methods. A far bigger impact is coming from the fact that we’re starting to see human information – tags being one of many possible sources – brought into the search play.

– – –

Hjalmar Gislason is the founder of Spurl.net. The above reflects some of the things his company is doing with the Spurl.net bookmarking system and the Zniff search engine.

Is search the “next spam”?

I was a panelist at a local conference on Internet marketing here in Iceland last Friday.

As you can imagine, a lot of the time was spent on talking about marketing using search engines. Both how to use paid for placement (i.e. ads) and how to optimize pages to rank better in the natural search results.

The latter, usually called “search engine optimization” or just SEO has a somewhat lousy reputation. More or less everybody that owns a domain has gotten a mail titled something like: “Top 10 ranking guaranteed” from people that claim that they can work some magic on your site (or off it) and it will get you to the top of the natural search engine results for the terms that you want to lead people to your site.

Needless to say, these people are fraud. If the spam isn’t enough of a giveaway there are two additional reasons. First of all, all the major search engines guard their algorithms fiercely and change them frequently, exactly to fight off people that are trying to game their systems in this way. Secondly they will probably choose an oddball phrase (see the story of Nigritude Ultramarine) that nobody else is using and have you put that on your site. Needless to say, it will get you a top 10 ranking ‘coz you’re the only one using it.

Additionally some of these “respectable professionals” have so called link farms that are in fact huge infinite loops of pages linking to each other in order to give the impression that any one of these pages is a highly popular one because it has so many incoming links. The link farm pages (here is an OLD blog entry I did on that) are usually machine generated and make little or no sense to humans, but may trick the search engines – for a while. When the search engine engineers find out however, they will get in foul mood and are likely to punish or even ban all the pages involved from their search engines, and then you’re better off without them in the first place!

There is a range of other tricks that bad SEOs use, but the results from using them are usually the same: It will rank you high for a while, and then have the opposite effect.

On the other hand, there are a series of things that you should do to make it easier for search engines to understand what your site is all about, without playing dirty. Most of these things are exactly the same things as make the page more usable for humans: Descriptive page titles, informative link texts, proper use of the terms you want to link to your site and last but not least good and relevant information on your subject. If you have a highly usable and informative site, it will lead to people linking to you as an information source and that will give you quality, incoming links from respectable sites – one of the most important things for ranking well, especially at Google.

Enough about that – others have written much better introductions to search engine marketing and a lot of respectable people make a living out of giving healthy tips and consult on these issues.

The main point is that in the search engine world, there is a constant fight between the bad SEOs and the engineers at the search engine companies.

During dinner, after the conference – I had a very interesting chat with one of the speakers, who expressed his concern that “search was the next spam”, meaning not that people will try to game the search engines (they have done so sinvce way back in the 90s), but that they will get so good at it that it will start to hurt the search business in the same way as email spam has hurt email.

As mentioned before, there are two ways to get exposure in search engine results: paid for placement and natural search results. With paid search estimated a 2.2 billion dollar industry last year, there is a lot of incentive for a lot of clever (yet maybe a little immoral) engineers to try their best to rank high in the natural results. The natural results are a lot more likely to get clicked and therefore more valuable than the paid ones – anybody want to estimate a number?

And the evidence is out there. Even Google sometimes gives me results that are obviously there for the “wrong reason”. Other engines do it more frequently. And even with the “non-spam” results there has been talk recently about the majority of search results being commercial – even in fields where you would expect to get both commercial and non-commercial results.

The paid for placement results are of course commercial by nature – no problem there as long as they are clearly marked as such. The natural results on the other hand, should reflect the best information on the web – commercial or not. The problem is that commercial sites are usually made by professionals and by now, most of them pay – at least some – attention to optimization that will help them getting higher ranking in the search engines. Not necessarily the bad tricks, but just the good ones. And therefore the majority of the natural results is and will be commercial too.

Search engine indexes, populated by machine bots – engineering marvels as they are – simply cannot make the needed distinction here. The ones that can tell what people will find to be good results are – you guessed it – other people.

Human information will become increasingly important in the fight against spam and to keep the search engine results free of commercial bias. Human information was what created Yahoo! in the beginning. This was also the brilliance of Google’s page rank and link analysis when they began – they were tapping into human information, links that people (webmasters) created were considered a “vote” and meta information about the page they linked too. This is what search engines saw as a major asset in the blogging community and this is why humanly created indexes like the ones that are constantly growing now at bookmarking services such as my Spurl.net, del.icio.us and LookSmart’s Furl will become major assets in the search industry in the coming months and years. And with a decent reputation / trust system (think Slashdot), it will be relatively easy to keep the spammers out – at least for a while.

The Origins of Our Ideas

“Tell me what you read and I shall tell you what you are” is an anonymous play on a famous proverb.

Every day we take in a lot of information from a variety of sources. This information shapes our ideas, opinions and to some extent our personality. But where is it coming from?

Most of us don’t pay a lot of attention to this. We believe we make up our own opinions about the things that matter to us and leave the rest to professional and / or self proclaimed pundits.

Among the media that shape our opinions are movies, TV, radio, newspapers, magazines, books and of course (and largely for many of us) the things we read on the Web. If we would somehow keep track of all the content that we “consume” this way we could – at least to some extent – trace the origins of our ideas. Who it is that really shapes our opinions and views on the world. I think it would be amazing to see how uniform these origins are for most of us in the Western world. And talking about the Western world, it could be argued that it is the origins of our ideas that defines a culture and separates one from the other.

Tracing ideas
In the last couple of years tools have been emerging that make it possible to trace the origin of ideas more than ever before. This is largely due to the increase in “personal publishing” with blogs and blogging tools. At the same time, the tools are almost entirely limited to the blogosphere, but its a start.

One of the defining things for a blog is extensive linking to the sources and related information to the topic in question. The TrackBack functionality of some of the blogging systems such as MovableType make those links two-way so that not only do my readers know where my ideas are coming from they can also see who’s catching my train of thought and continuing the discussion, linking to me. This allows for some traceability of ideas from one weblog to the other.

But this is nowhere near sufficient. The TrackBack functionality is not that widely used and therefore covers only a miniscule part of all the content that appears online every day, let alone all the content that is already out there.

Many people have spotted this and you can now find several tools on the web that are dedicated to tracing links to and from web pages, especially weblogs. I think Blogdex was first in that line (coming out of MIT’s Media Lab), but others followed such as DayPop, Popdex and last but not least Technorati, probably the largest such index now tracking almost 3 million weblogs on a daily basis.

Theses tools allow you to enter a url or domain and see who have been linking to it. As an example I can see who’s been linking to Spurl lately. Interestingly enough none of these tools seems to allow you to see a list of the urls a given weblog has been linking TO only who has been linking TO IT. Such a list would be very interesting to do a quick “background check” on a blog you come across as ideas generally flow in the opposite direction of the link. When I link to a news article on BBC that almost certainly means I’ve read the article. Instead I suspect that the majority of searches on Technorati and co. are self-centered (ok self-conscious) bloggers trying to find out if anyone is linking to them. It should be very simple for the Pop-Day-Dex-Ratis to add this functionality.

Keeping track of the consumption
Obviously not everybody has a weblog and even those who do don’t write about everything the read. In fact it is interesting how bad the browsers we use every day are at helping users to keep track of the information we consume using them. Ever spent frustrating minutes or hours on Google or in the browser’s history list trying to find again that vital piece of information you read the other day?

Even the bookmarks / favorites that are designed to capture those “What a great page!” moments are very limited. They are only stored locally, so you don’t have access to your home bookmarks at work or in school and they usually get lost when the computer is upgraded or the hard drive crashes. In some browsers, bookmarks can not even be searched, let alone searched using a full-text search of the actual contents of the marked pages. The result is that few people use the bookmarks except maybe as a shortcut to a dozen or two of their most used web sites.

This problem has also been spotted and in the last few months, tools have been emerging that far exceed the traditional bookmarks in terms of functionality and usability, and even tap into some of the interesting social aspects emerging from the fact that thousands of other users are also using the same tools. Among these tools are Spurl.net (of which I am a founder), del.icio.us, Furl, Simpy and a few others.

To a varying degree, these tools offer functionality such as for example full text searching in all the pages users have marked, storing the entire contents of a marked page (addressing the “linkrot” problem), browser sidebars and toolbars for quick access, recommendations by matching user profiles, and of course access from any Internet connected computer (there is a lot more). Combined this makes the tools not only a replacement for traditional bookmarks, but rather a permanent record of the content a user has consumed during his or her browsing. The user can then use this record almost as a memory augmentation to later recall the information.

And there we have another piece in the puzzle that can help us seeing where our ideas are coming from.

Drawing the flow
Stephen VanDyke wrote a very interesting blog entry in the beginning of March, touching up on the same subject: “How News Travels on the Internet”. He even drew a flow chart of how he sees the flow of news online.

Obviously the flowchart only shows a fraction of the entire news-flow and the graph is rather centric to the blogging world, leaving out some of the more traditional ways news travel, such as press releases from the Source directly to the Traditional “Big” Online Media or even all the way to the Offline Media. A very thought provoking graph none the less.

The material that travels in the “Dark Matter” is especially interesting, i.e. emails, chat and instant messages where people are sending simple messages saying “Hey check this out.”. In the end, what captures the attention of the Big Media is largely coincidence, but as they reach a far larger and broader group, their coverage feeds into the blogosphere again and the circle can even repeat itself.

Another interesting attempt at drawing the flow of ideas is the Blog Epidemic Analyzer from HP. The project has indexed the flow of some 20,000 URLs as they go through the blogosphere from one blog to the next, spawning more blogging until they gradually fade from the discussion (and a new “hottest thing” emerges).

All the way to “The Source”
…sounds a bit like the plot for Matrix 4 – doesn’t it?

What I dream of is that I can one day – using tools similar to those described above – find out where the news and other information I’m consuming is really originated. That way I can see if my sources are colored by the influence of a single media company, religious group or government; the Bush administration or Michael Moore; the WWF for Nature or Texaco; the fans of the Pistons or the Lakers.

And if my sources turn out to be uniform I can at least say that it is on purpose or spend a little time studying the arguments of the opposite party. Then I can say in good faith that my opinions are formed after taking the arguments of both sides into account and making up my own mind, but were not deliberately or accidentally forced upon me by a like-minded group of people with a single view on things.

There is still a long way to go, but these tools and other similar attempts hint that this may very well be an achievable goal.