• Home
  • About / Um Hjalla

hjalli.com – Hjálmar Gíslason

Technology and other wonders / Tækni og fleiri undur veraldar

Feeds:
Posts
Comments
« Is search the “next spam”?
Google and user driven search indexes »

Coming to terms with tags: folksonomies, tagging systems and human information

April 13, 2005 by Hjalmar Gislason

Over the past few years we’ve seen a big movement from hierarchical categories to flat search. Web navigation and email offer prime examples: Yahoo’s Directories gave way for Google’s search, Outlook’s folders are giving way for the search based Gmail. It’s far more efficient to come up with and type in a few relevant terms for the page or subject you’re looking for than it is to navigate a hierarchy. A hierarchy that may even be built by somebody else, somebody that probably has a different mindset, and therefore categorizes things differently than you would.

Lately, tags have become the hot topic when discussing information organization (or – some might say – the lack thereof). But tagging and flat search are really just two sides of the same coin.

The main reason tagging is so useful is that it resembles and improves search. It assigns relevant terms to web pages, images or email messages, making it easier to find them by typing in one or more of those relevant terms or selecting them from a list. Furthermore, tag based system assign YOUR terms to a resource – based on your mindset – not somebody else’s, making them extra useful for managing one’s own information. The more terms you assign to the resource initially, the more likely it is that one of them comes to mind when you’re looking it up again later on.

But you also want to find new information, not only what you’ve already found or seen. And again, tagging comes to the rescue. If somebody else has found a term relevant for a page, it’s likely that so will you. If that person’s mindset (read: use of tags, range of interests and other behavior) resembles yours, the more likely it is to be of use to you. Furthermore, if a lot of people agree on assigning the same tag to a resource, the more relevant we can assume it to be for that resource.

Tags don’t tell the whole story

Tags are wonderful, but they won’t cure cancer all by themselves.

When tagging a resource – especially in these early days of tagging systems – it is quite likely that you are the first person to tag it. Think as hard as you can and you will still not come up with all the terms that you nevertheless would agree are highly relevant to the subject – terms that you might even think of later when you want to look it up again. It is therefore important to use other methods as well to help assigning relevant terms to the resource, when possible.

Firstly, there are other ways than tagging to help users to identify relevant terms. By allowing them to highlight the snips from the text that they find most interesting, you can give extra weight to that text. Descriptions of the resource, written in a free format are another way. Here you can see an example of a page that has been highlighted using tags, snips and other information from Spurl.net users (another example here).

Secondly you have the author’s own information and that should (usually) account for something. In the case of a web page you have the page text and its markup. The markup identifies things that the page author found to be important, among them:

  • The page title
  • Headings
  • Bolded and otherwise emphasized text
  • Meta keywords, descriptions and other meta information
  • Things near the top are usually more relevant than at the bottom

Analyzing all this text information (both the user’s and the author’s) in relation to expected word frequency identifies more relevant terms than the tags alone.

And there are even more ways to get relevant terms, link analysis (words that appear in and around links to the document and the relevancy of the linking documents), etc. Basically all the methods that Google-like engines are already using to index pages using spiders and other automated methods.

Relevant terms are a consensus

That said, relevant terms for a resource must be seen as a consensus. A consensus between the user himself, other users, the page author and the “rest of the web”.

If all of these players are roughly in agreement, all is well and the term relevancy can be trusted. If the information from the page’s author is in no agreement with information from the users, it is likely that the author is trying to trick somebody – most likely an innocent search engine bot that wanders by – and the author’s information should be given less weight than otherwise.

When searching for something, you don’t want to end up empty handed just because nobody had thought of tagging a resource with the terms that you entered. The relevant terms must be seen as layers of information, where your tags and other terms that you have assigned play a central role, then comes information from other users – giving extra weight to those users that are “similar” to you (based on comparing previous behavior), then the page author and finally the “rest of the web”.

Trust systems

In addition, you need trust systems to prevent the search from being “gamed”. Trust should be assigned to all players in the chain: users, authors and linking web pages. If a long-time, consistent and otherwise well-behaving user assigns a tag to a page, we automatically give it high weight. If a new user includes a lot of irrelevant information on his first post, his trust is ruined and will take a long time to be regained.

Users gradually build trust within the system and if they start to misbehave, it affects everything that they’ve done before. Same goes for web page authors and linking pages. If your information is in no way in line with the rest of the consensus, your information is simply ignored. Providing relevant information is the only way to improve your ranking. (more in this post on Spurl.net’s user forums).

On the author layer, this could even resurrect the long dead keyword and description meta-tags, that are – after all – a good idea, if it hadn’t been so blatantly misused.

Coming to terms with tags

The “folksonomy” discussion, so far, has been largely centered on user assigned tags and tagging systems, but these are just a part of a larger thing.

Relevant terms – as explained above – are really what we are after, and it just so happens that tags are the most direct way to assign new relevant terms to a resource. But they are far from the only one. Tagging is not “the new search”; it is one of many ways to improve current search methods. A far bigger impact is coming from the fact that we’re starting to see human information – tags being one of many possible sources – brought into the search play.

- – -

Hjalmar Gislason is the founder of Spurl.net. The above reflects some of the things his company is doing with the Spurl.net bookmarking system and the Zniff search engine.

Advertisement

Share this:

  • Facebook
  • Twitter
  • More
  • Digg
  • Email
  • StumbleUpon
  • LinkedIn
  • Reddit

Like this:

Like
Be the first to like this post.

Posted in english, search | 1 Comment

One Response

  1. on April 15, 2005 at 15:32 Viðar Másson

    Einmitt það já.



Comments are closed.

  • Hjálmar Gíslason


    A technology enthusiast and general nerd living in Iceland. Founder of four tech-companies. Currently working on DataMarket.

    English only

  • Me elsewhere

    LinkedIn
    Twitter
    Flickr
    Facebook
  • Tweet

    • Heilsusamlegt upplýsingaæði er ekki síður mikilvægt en heilbrigt mataræði: http://t.co/87wMxcSw 6 days ago
    • Note to self: Product development != Software development 1 week ago
    • Meðalfjöldi ferða um Víkurskarð í janúarmánuði 2012 = 523 - http://t.co/e1xKf61L 1 week ago
    • just paged through the least interesting issue of @wired to date (jan 2012) in about 20 minutes. Guys, are you loosing the touch? 2 weeks ago
    • deeply recommends @cjoh 's book, The Information Diet: http://t.co/54cpHiWy Perfect=no. The best "food for thought" I've consumed lately=yes 2 weeks ago
    • @thorarinnh :) 2 weeks ago
    • I've mentioned the value of startup obituaries before. Here's yet another insightful one, this time by @marksoper - http://t.co/GK0FH7pa 3 weeks ago
    • RT @maranomynet: http://t.co/XiYLie3x RT @raggam Gögn um opinbera vefi eru komin í kerfi @datamarket - glæsilegt! 3 weeks ago
    • sér að kálfskinnsskrifarar finna prentvél Gutenbergs allt til foráttu þessa dagana. #21öldinhringdi 3 weeks ago
    • @cjoh Reading your excellent Info Diet book. Criticism: When comparing news analysts etc. to PR people, you ignored: http://t.co/wZsdYSpy 3 weeks ago
  • Hjalli's Flickr photos

    Hrútfjallstindar, Svínafellsjökull og Hvannadalshnjúkur

    Morgunsólin yfir tindunum

    Gengið skýjum ofar

    Svínafellsjökull

    Suður- og Vesturtindar

    More Photos
  • Supports / Styð

  • Top Posts

    • Heilsusamlegt upplýsingaæði
    • Massively Multiplayer Robot Game (virtual reality without the “virtual”)
    • Af iðnaðarsalti
    • Tekjuskattur meðal-Jóns: Sundurliðaður reikningur
    • Bensínverð: Samsetning
  • Category Cloud

    Artificial intelligence Biomimicry Brain technologies Bugs & quirks data datamarket Emergence english features ferðalög General Genetic computing Ideas I like! iphone leitarvélar mobile Næsta Ísland nýsköpun opin gögn Philosophy Robotics search seen Spurl.net Trendwatch tölvur & tækni Uncategorized Þriðjudagstæknin íslenska

Blog at WordPress.com.

Theme: MistyLook by Sadish.


Follow

Get every new post delivered to your Inbox.

Powered by WordPress.com
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.