Author: Hjalmar Gislason

About Hjalmar Gislason

Founder and CEO of GRID (https://grid.is/). Curious about data, technology, media, the universe and everything. Founder of 5 software companies.

Þorláksmessa 2005

Allir sem vilja þekkja okkur velkomnir að venju á Þorláksmessu á:

Laugaveg 82
allskonar
JÓLAGLÖGG

allsnægtir af ávöxtum og öðru góðgæti. Úrvals vörur. Ekkert verð. Þjer ættuð að líta inn í fallegu íbúðina okkar. Hvar sem þjer annars eigið heima í bænum.

Magga&Hjalli

—–

As always, our friends and families are invited to drop in on December 23rd on:

Laugaveg 82
all kinds of
JÓLAGLÖGG

fruits and other delicatessen plentiful. Quality products. Moderate prices 🙂 Thou should drop by wherever thou lives in thee world.

Magga&Hjalli

Upphaflega auglýsingin er úr Morgunblaðinu laugardaginn 14. desember 1935


Helstu reglur:

  • Ykkur er óhætt að koma hvenær sem er eftir klukkan svona 16-17.
  • Það má stoppa stutt.
  • Það má stoppa lengi.
  • Það má koma tvisvar.
  • Það má koma með börn, vini, fjarskylda ættingja og nýfundna kunningja, hunda, hesta og kanínur, en helst ekki mjög rauðhærða.
  • Það má sleppa því að koma – en það er litið hornauga.
  • Gleðinni lýkur þegar síðasti gesturinn rúllar niður stigann.

Hlökkum til að sjá ykkur sem flest, við ykkur hin segjum við:

Gleðileg jól
og
farsælt komandi ár

New blog

I’ve started a new blog directly under hjalli.com

It will be a bit more general and there will be posts in both English and Icelandic as suits each subject.

It’s still very fresh and immature, but I hope to post some interesting thoughts there on Spurl.net, web search and the internet as well as some “old-fashioned” wetware stuff. The Icelandic posts will probably be more general web and technology discussions as well as some personal stuff.

Hope to see you there…

Spurl launches an Icelandic search engine

Originally posted on the Spurl.net forums on November 1st 2005

Spurl.net just launched an Icelandic search engine with the leading local portal / news site mbl.is. Although Iceland is small, mbl.is is a nice, mid-size portal with some 210,000 unique visitors per week – so this has significant exposure.

The search engine is called Embla – it’s a pun on the portal’s name (e mbl a) and the name of the first woman created by the Gods in Norse Mythology.

An Icelandic search engine may seem a little off topic for many of you Spurl.net users, but it is relevant for several reasons:

  • We have been busy for some 4 months now, building the next version of Zniff. Embla uses this new code and this code will make its way into both Spurl.net (for searching your own spurls) and Zniff (for searching the rest of the Web) in the next weeks. The engine is now reliable, scalable, redundant and compliant with a lot of other .com buzzwords – with more relevant search results and subsecond response times on most queries.
  • This commercial arrangement with mbl.is (hopefully the first of many similar) gives us nice financial footing for further development of Spurl.net and related products.
  • Embla uses information from Spurl.net users. Even though only a small portion of our users are Icelandic, we’re using their information as one of the core elements for ranking the search results – and with good outcome. This strengthens our believe in the “human search engine” concept for other markets as well as in the international playing ground.

I plan to write more on the search engine itself on my blog soon, but just wanted to mention briefly that with Embla we’re also breaking new ground in another territory: Embla “knows” Icelandic. Most search engines and technologies are of English origin, and from a search standpoint, English is a very simple language. Most English words have only a couple of word forms (such as “house” and houses”) and while some search engines use stemming (at least sometimes), it doesn’t matter all that much for English. Many languages – including Icelandic – are far more complicated. Some words can – hypothetically – have more than 100 different word forms. In reality a noun will commonly have about 12-16 unique word forms. Now THIS really matters. The difference in the number of returned results is sometimes 6 to 10-fold, and it improves relevance as well.

We have built Embla so that it searches for all forms of the user’s search words. We also offer spelling corrections for Icelandic words, based on the same lexicon. The data for this comes from the Institute of Lexicography, University of Iceland, but the methodology and technology is ours all the way and can (and will) be used for other languages as well. Quite cool stuff actually – as mentioned, I will write more about that on the blog soon.

Stofnfundur samtaka vefiðnaðarins (SVEF)

Stofnfundur Samtaka Vefiðnaðarins var haldinn í gær, en samtökin eru samtök áhugafólks um vefmál.

Nánar um það á SVEF.is

Undirritaður var með kynningu um þróun Emblunnar, en hópurinn sem stóð að stofnun samtakanna er einmitt sami hópur og hefur borið veg og vanda af Íslensku vefverðlaununum undanfarin ár og mbl.is hlaut einmitt í ár fyrir Emblu.

Glærurnar úr fyrirlestrinum eru hér:

Þriðjudagstæknin: Samskipti við tölvur

Efni Þriðjudagstækninnar í dag eru boðleiðirnar sem við notum til að koma vitinu fyrir tölvurnar okkar – og þær fyrir okkur.

Samskipti okkar við tölvur eru í rauninni afar frumstæð. Við notum mús og lyklaborð til að skipa tölvunni fyrir og hún birtir okkur skilaboð á skjá. Einhver myndi vilja bæta við hljóði og jafnvel mynd, en sannleikurinn er sá að bæði hljóðnema og hátalara notum við nær eingöngu til afþreyingar, eða til að eiga í samskiptum við annað fólk í gegnum tölvuna, ekki við tölvuna sjálfa.

Skjáir
Skjáirnir sem við notum í dag eru í raun pínulitlir. 19 eða jafnvel 21 tommu skjár kann að virðast stór, en það er bara vegna samanburðar við enn minni skjái sem við höfum vanist í gegnum tíðina.

Skjárinn og skjáborðið er vinnusvæði okkar “þekkingariðnaðarmanna” – þetta er okkar skrifborð. Hver myndi láta bjóða sér 19 tommu skrifborð? Og hversu miklu kæmi maður í verk ef maður þyrfti sífellt að umraða til að koma öllum bókum og skjölum fyrir þar?

Nær allar fartölvur ráða við að birta skjámynd á viðbótarskjá og stækka þannig vinnusvæðið. Það sama er hægt að gera á borðtölvum en þá þarf reyndar tvö skjákort í vélina. Fyrir þá sem vinna með mikið af upplýsingum eða stór skjöl (t.d. forritara, verðbréfamiðlara og grafíkera) er ekki spurning að fjárfesting í öðrum skjá borgar sig.

Músin
Það er svolítið skondið að það tól sem við notum hvað mest til að koma tölvunni í skilning um hvað við viljum, er jafn frumstætt og við getum ímyndað okkur að hellisbúar hafi notast við áður en tungumálið var fundið upp. Við bendum í áttina að einhverju (færum músarbendilinn yfir það), rymjum (smellum) og bendum svo í áttina að einhverju öðru og rymjum aftur – stundum jafnvel tvisvar (tvísmellum). Ekki ósviparð og Þorsteinn Guðmundsson í “búddí-búddí” auglýsingunni frá KB banka (titillinn er “Tungumál”).

Auðvitað virkar þetta ágætlega, en það rúmast ekki miklar upplýsingar í hverri músaraðgerð.

Lyklaborðið
Öflugasta tækið sem við höfum til að tjá okkur við tölvur í dag er lyklaborðið. Það er ekki endilega vegna þess að það sé svo notendavænt, heldur vegna þess að þar erum við að nota þann tjáskiptamáta sem gefst best okkar á milli – tungumálið (eða allavegana svona eitthvað í áttina).

Þeir sem eru vanir að nota skipanalínu og flýtitakka í forritum eru mun fljótari að vinna flest verk en við hin sem þurfum að færa höndina á músina í hvert sinn sem þarf að gefa einhverja skipun.

Á margan hátt má segja að skipanalínan hafi fengið endurnýjun lífdaga með Google-leitarboxinu – sem er eins konar skipanalína okkar á Vefinn: Við sláum inn það sem við viljum gera eða finna og það birtist eins og hendi væri veifað.

Slík notkun á bara eftir að aukast og þessi aðferðafræði verður tekin lengra þegar leitarvélar fara að skilja betur hvað átt er við, t.d. með því að þekkja nöfn á hlutum svo sem fólki, stöðum og viðburðum og reyna að vinna meira með það. Ofurlítið dæmi um slíkt má m.a.s. finna í Emblunni okkar, sem þekkir m.a. heimilisföng, bókatitla, símanúmer og skammstafanir og reynir að veita viðbótarupplýsingar eða -þjónustu byggða á því.

Aðra athyglisverða (og mjög svo nördalega) tilraun til að endurvekja skipanalínuna má sjá í YubNub þjónustunni sem nota má til að gefa hinar og þessar skipanir á leitarvélar, reiknivélar og vefþjónustur með textaskipunum.

Talviðmót
Það er okkur mjög eðlilegt að tjá okkur með tali. Jafnvel þó það væri ekki samfellt tal, heldur stakorðar skipanir væri það stórt skref í áttina að bættum samskiptum manns og tölvu: “opnaðu ársskýrsluna”, “finndu símanúmer Jóns Jónssonar” o.s.frv. Slíkt er tæknilega mögulegt í dag (jafnvel á íslensku), en samt þyrfti býsna umfangsmikið kerfi til að skilja alla þá ótal vegu sem hægt er að segja sömu hlutina.

Talviðmót hafa líka ákveðna praktíska galla. Ímyndið ykkur til dæmis kliðinn sem myndi myndast ef allir vinnufélagar ykkar væru í sífellu að tala við tölvuna sína? Mér finnst nógu truflandi þegar menn í kringum mig grípa símann í eitt og eitt símtal. Að sama skapi myndi slíkt mal reyna talsvert á raddböndin, nokkuð sem t.d. fólk í kennarastétt kannast við – og þá ekki af góðu.

Talviðmót munu þó án efa verða notuð í auknum mæli eftir því sem talgreiningu fleygir fram og ekki spillir að sífellt fleiri eiga nú heyrnatól með hljóðnema, samhliða aukinni notkun Skype og annarra VoIP lausna.

Augn- og líkamshreyfingar
Bendingar, augnatillit og líkamstjáning er stór hluti af okkar náttúrulegu samskiptum. Stundum hafa menn reynt að kasta einhverri tölu á það hve stór hluti merkingarinnar sé tjáður á þennan hátt og man ég þar eftir að hafa heyrt tölur á bilinu 25-50% – hvernig í ósköpunum sem það er nú mælt.

Það eru þegar til tól sem geta fylgst mjög nákvæmlega með því hvert fólk er að horfa. Slík tól hafa m.a. verið notuð til að gera áhugaverðar rannsóknir á því hvernig fólk les vefsíður. Einnig mætti hugsa sér að á sama hátt mætti stýra bendli á skjá í stað músarkvikindisins.

Og jafnvel þó nútíma myndgreining kæmist sjálfsagt ekki langt í að lesa í líkamstjáningu okkar, gæti tölva lesið mikilvægar upplýsingar ef hún “sæi” umhverfi sitt. Það eitt hvort yfirhöfuð einhver hreyfing sé við tölvuna gæti hjálpað til við gáfulega hegðun tölvunnar, t.d. hvenær á að keyra vírusleit, eða hvenær á að starta skjásvæfunni.


Þriðjudagstæknin er vikulegt spjall á sjónvarpsstöðinni NFS – á þriðjudögum kl. 11:10.

Using AJAX to track user behavior

Here’s a thought. With the rise of AJAX applications this is bound to happen, and may very well have been implemented somewhere, even though a quick search didn’t reveal a lot. So here goes:

By adding a few clever Javascript events to a web page, it is possible to track user behavior on web pages far more than with the typical methods.

Similar methods have for a long time been used on some sites (including my own Spurl.net) to track clicks on links. A very simple script could also log how far down a page a user scrolls, over which elements he hovers the mouse and even infer how long he spends looking at different parts of the page and probably several other things usability experts and web publishers would kill (or at least seriously injure) to learn.

Even though it’s nowhere near as precise as the famous eyetracking heatmaps (here’s the one for Google’s result pages), it provides a cheap method for web publishers to help better position ads and user interface elements, experiment with different layouts, etc.

Does anybody know of implementations like this? Does it raise some new privacy concerns?

P.S.: I just saw that Jep Castelein of Backbase posted some related thoughts this summer.

Google and user driven search indexes

Needless to say, I am a firm believer that the next big steps in web search will come from involving the users more in the ranking and indexing process.

The best search engines on the web have always been built around human information. In the early days, Yahoo! was the king, first based on Jerry Yang’s bookmark collection and later on having herds of human editors categorizing the web building on top of Jerry’s collection. The web’s exponential growth soon blew this model and for a year or so, the best ways to search the web involved trying to find one or two useful results in full pages of porn and casino links on Altavista or HotBot.

Then Google came along and changed everything with their genius use of the PageRank algorithm, assuming that a link from one web page to another was a vote for the quality and relevance of the content of the linked page. You know the story. Today, all the big engines use variations of this very same core methodology, obviously with a lot of other things attached to fight spam and otherwise improve the rankings.

Next steps: From page content to user behavior

But the next step is due and the big guys in the search world are really starting to realize this. A lot of the recent innovations coming from the search engines – especially from Google – are targeted directly at gathering more data about user behavior that they can in turn use to improve web search.

Among the behavior information that could be used for such improvements are:

  • Bookmarks: A bookmark is probably the most direct way a user can say “I think this page is of importance”, additionally users usually categorize things that they bookmark, and some browsers and services – such as my very own Spurl.net – make the bookmarks more useful by allowing users to add various kinds of meta data about the pages they are bookmarking. More on the use of that in my recent “Coming to terms with tags” post.
  • History: What pages has the user visited? The mere fact that a user visited a page does not say anything about it other than that the page owner somehow managed to lure the user there (usually with good intentions). The fact that a page is frequently visited (or even visited at all) does however tell the search engine that this page is worth visiting to index. This ensures that fresh and new content gets discovered quickly and that a successful black hat SEO tricks (in other words “search engine spam”) don’t go around for long without getting noticed.
  • Navigational behavior: Things such as:
    • which links users click
    • which links are never clicked
    • how far down do users scroll on this page
    • how long does a user stay on a page
    • do users mainly navigate within the site or do the visit external links
    • etc.

All of these things help the search engine determining the importance, quality and freshness of the page or even of individual parts of the page.

All of these things and a lot more are mentioned in one way or another in Google’s recent patent application. Check this great analysis of the patent for details. Here’s a snip about user data:

  • traffic to a document is recorded and monitored for changes (possibly through toolbar, or desktop searches of cache and history files) (section 34, 35)
  • User behavior is websites are monitored and recorded for changes (click through back button etc)(section 36, 37)
  • User behavior is monitored through bookmarks, cache, favorites, and temp files (possibly through google toolbar or desktop search) (section 46)
  • Bookmarks and favorites are monitored for both additions and deletions (section 0114, 0115)
  • User behavior for documents are monitored for trends changes (section 47)
  • The time a user spends on website may be used to indicate a documents quality of freshness (section 0094)

Google’s ways of gathering data

And how exactly will Google gather this data? Simple: by giving users useful tools such as the Toolbar, Deskbar and Desktop Search that improves their online experience and as a side effect provides Google with useful information such as the above.

I’ve dug through the privacy policies for these tools (Toolbar, Desktop Search, Deskbar) and it’s not easy to see exactly what data is sent to Google in all cases. The Toolbar does send the them the URL of all visited pages to provide the PageRank and a few other services. It is opt-out, but default on, so they have the browser history of almost every user that has the toolbar set up. Other than this, details are vague. Here are two snips from the Desktop search Privacy FAQ:

“So that we can continuously improve Google Desktop Search, the application sends non-personal information about things like the application’s performance and reliability to Google. You can choose to turn this feature off in your Desktop Preferences.”

“If you send us non-personal information about your Google Desktop Search use, we may be able to make Google services work better by associating this information with other Google services you use and vice versa. This association is based on your Google cookie. You can opt out of sending such non-personal information to Google during the installation process or from the application preferences at any time.”

It sounds like error and crash reporting, but it does not rule out that other things, like bookmarks, time spent on pages, etc. are sent also. None of the above mentioned products provides a finite, detailed list of things that are sent to Google.

Next up: Google delivering the Web

And then came the Web Accelerator. Others have written about the privacy, security and sometimes actually damaging behavior of the Web Accelerator, but as this blog post points out, the real reason they are providing it is to monitor online behavior by becoming a proxy service for as many internet users as possible.

I’m actually a bit surprised with the web accelerator for several reasons. Google is historically known for excellent quality of their products, but this one seems to have been publicly released way too early judging from the privacy, security and bug reports coming from all over the place. Secondly, the value is questionable. I’m all for saving time, but if a couple of seconds per hour open up a range of privacy and security issues I’d rather just live a few days shorter (if I save 2 seconds every hour for the next 50 years that amounts to 10 days) 🙂

The third and biggest surprise come from the fact that I thought Google had already bought into this kind of information with their fiber investments and the aqcuisition of Urchin. Yes, I’m pretty sure that the fiber investment was made to be able to monitor web traffic over their fat pipes for the reasons detailed above – even as a “dumb” bandwidth provider. Similar things are already done by companies such as WebSideStory and Hitwise that buy traffic information from big ISPs.

Urchin’s statistics tools can be used for the same purpose in various ways – I’m pretty sure Google will find a way to make it worth an administrator’s trouble to share his Urchin reports with Google for some value-add. So why bother with the Web Accelerator proxy play?

Google already knows the content of every web page on the visible web, now they want to know how we’re all using it.

Good intentions, bad communication

Don’t get me wrong. I’m sure that the gathering of all these data will dramatically improve web search and I marvel the clever technologies and methods in use. What bothers me a bit is that Google is not coming clear about how they will use all this data, or even the fact that they are and to what extent they are gathering it.

I can see only two possible reasons for this lack of communication. The first one being that they don’t want to tell their competitors what they are doing. This can’t be the case, Yahoo, MSN and Ask Jeeves surely have whole divisions that do nothing but reverse engineer Google products, analyze their patent filings and so on. That’s just naive. The second reason I can think of is that they are afraid that users will not be just as willing to share this information if it becomes so clearly visible how massive their data gathering efforts have become.

I’m not an extremely privacy concerned person myself, but I respect, understand and support the opinions of those that are. The amount of user data that is gathered will gradually cause more people to think what would happen if Google turned evil.

– – –

Hjalmar Gislason is the founder of Spurl.net. The above reflects some of the things his company is doing with the Spurl.net bookmarking system and the Zniff search engine.

Coming to terms with tags: folksonomies, tagging systems and human information

Over the past few years we’ve seen a big movement from hierarchical categories to flat search. Web navigation and email offer prime examples: Yahoo’s Directories gave way for Google’s search, Outlook’s folders are giving way for the search based Gmail. It’s far more efficient to come up with and type in a few relevant terms for the page or subject you’re looking for than it is to navigate a hierarchy. A hierarchy that may even be built by somebody else, somebody that probably has a different mindset, and therefore categorizes things differently than you would.

Lately, tags have become the hot topic when discussing information organization (or – some might say – the lack thereof). But tagging and flat search are really just two sides of the same coin.

The main reason tagging is so useful is that it resembles and improves search. It assigns relevant terms to web pages, images or email messages, making it easier to find them by typing in one or more of those relevant terms or selecting them from a list. Furthermore, tag based system assign YOUR terms to a resource – based on your mindset – not somebody else’s, making them extra useful for managing one’s own information. The more terms you assign to the resource initially, the more likely it is that one of them comes to mind when you’re looking it up again later on.

But you also want to find new information, not only what you’ve already found or seen. And again, tagging comes to the rescue. If somebody else has found a term relevant for a page, it’s likely that so will you. If that person’s mindset (read: use of tags, range of interests and other behavior) resembles yours, the more likely it is to be of use to you. Furthermore, if a lot of people agree on assigning the same tag to a resource, the more relevant we can assume it to be for that resource.

Tags don’t tell the whole story

Tags are wonderful, but they won’t cure cancer all by themselves.

When tagging a resource – especially in these early days of tagging systems – it is quite likely that you are the first person to tag it. Think as hard as you can and you will still not come up with all the terms that you nevertheless would agree are highly relevant to the subject – terms that you might even think of later when you want to look it up again. It is therefore important to use other methods as well to help assigning relevant terms to the resource, when possible.

Firstly, there are other ways than tagging to help users to identify relevant terms. By allowing them to highlight the snips from the text that they find most interesting, you can give extra weight to that text. Descriptions of the resource, written in a free format are another way. Here you can see an example of a page that has been highlighted using tags, snips and other information from Spurl.net users (another example here).

Secondly you have the author’s own information and that should (usually) account for something. In the case of a web page you have the page text and its markup. The markup identifies things that the page author found to be important, among them:

  • The page title
  • Headings
  • Bolded and otherwise emphasized text
  • Meta keywords, descriptions and other meta information
  • Things near the top are usually more relevant than at the bottom

Analyzing all this text information (both the user’s and the author’s) in relation to expected word frequency identifies more relevant terms than the tags alone.

And there are even more ways to get relevant terms, link analysis (words that appear in and around links to the document and the relevancy of the linking documents), etc. Basically all the methods that Google-like engines are already using to index pages using spiders and other automated methods.

Relevant terms are a consensus

That said, relevant terms for a resource must be seen as a consensus. A consensus between the user himself, other users, the page author and the “rest of the web”.

If all of these players are roughly in agreement, all is well and the term relevancy can be trusted. If the information from the page’s author is in no agreement with information from the users, it is likely that the author is trying to trick somebody – most likely an innocent search engine bot that wanders by – and the author’s information should be given less weight than otherwise.

When searching for something, you don’t want to end up empty handed just because nobody had thought of tagging a resource with the terms that you entered. The relevant terms must be seen as layers of information, where your tags and other terms that you have assigned play a central role, then comes information from other users – giving extra weight to those users that are “similar” to you (based on comparing previous behavior), then the page author and finally the “rest of the web”.

Trust systems

In addition, you need trust systems to prevent the search from being “gamed”. Trust should be assigned to all players in the chain: users, authors and linking web pages. If a long-time, consistent and otherwise well-behaving user assigns a tag to a page, we automatically give it high weight. If a new user includes a lot of irrelevant information on his first post, his trust is ruined and will take a long time to be regained.

Users gradually build trust within the system and if they start to misbehave, it affects everything that they’ve done before. Same goes for web page authors and linking pages. If your information is in no way in line with the rest of the consensus, your information is simply ignored. Providing relevant information is the only way to improve your ranking. (more in this post on Spurl.net’s user forums).

On the author layer, this could even resurrect the long dead keyword and description meta-tags, that are – after all – a good idea, if it hadn’t been so blatantly misused.

Coming to terms with tags

The “folksonomy” discussion, so far, has been largely centered on user assigned tags and tagging systems, but these are just a part of a larger thing.

Relevant terms – as explained above – are really what we are after, and it just so happens that tags are the most direct way to assign new relevant terms to a resource. But they are far from the only one. Tagging is not “the new search”; it is one of many ways to improve current search methods. A far bigger impact is coming from the fact that we’re starting to see human information – tags being one of many possible sources – brought into the search play.

– – –

Hjalmar Gislason is the founder of Spurl.net. The above reflects some of the things his company is doing with the Spurl.net bookmarking system and the Zniff search engine.

Is search the “next spam”?

I was a panelist at a local conference on Internet marketing here in Iceland last Friday.

As you can imagine, a lot of the time was spent on talking about marketing using search engines. Both how to use paid for placement (i.e. ads) and how to optimize pages to rank better in the natural search results.

The latter, usually called “search engine optimization” or just SEO has a somewhat lousy reputation. More or less everybody that owns a domain has gotten a mail titled something like: “Top 10 ranking guaranteed” from people that claim that they can work some magic on your site (or off it) and it will get you to the top of the natural search engine results for the terms that you want to lead people to your site.

Needless to say, these people are fraud. If the spam isn’t enough of a giveaway there are two additional reasons. First of all, all the major search engines guard their algorithms fiercely and change them frequently, exactly to fight off people that are trying to game their systems in this way. Secondly they will probably choose an oddball phrase (see the story of Nigritude Ultramarine) that nobody else is using and have you put that on your site. Needless to say, it will get you a top 10 ranking ‘coz you’re the only one using it.

Additionally some of these “respectable professionals” have so called link farms that are in fact huge infinite loops of pages linking to each other in order to give the impression that any one of these pages is a highly popular one because it has so many incoming links. The link farm pages (here is an OLD blog entry I did on that) are usually machine generated and make little or no sense to humans, but may trick the search engines – for a while. When the search engine engineers find out however, they will get in foul mood and are likely to punish or even ban all the pages involved from their search engines, and then you’re better off without them in the first place!

There is a range of other tricks that bad SEOs use, but the results from using them are usually the same: It will rank you high for a while, and then have the opposite effect.

On the other hand, there are a series of things that you should do to make it easier for search engines to understand what your site is all about, without playing dirty. Most of these things are exactly the same things as make the page more usable for humans: Descriptive page titles, informative link texts, proper use of the terms you want to link to your site and last but not least good and relevant information on your subject. If you have a highly usable and informative site, it will lead to people linking to you as an information source and that will give you quality, incoming links from respectable sites – one of the most important things for ranking well, especially at Google.

Enough about that – others have written much better introductions to search engine marketing and a lot of respectable people make a living out of giving healthy tips and consult on these issues.

The main point is that in the search engine world, there is a constant fight between the bad SEOs and the engineers at the search engine companies.

During dinner, after the conference – I had a very interesting chat with one of the speakers, who expressed his concern that “search was the next spam”, meaning not that people will try to game the search engines (they have done so sinvce way back in the 90s), but that they will get so good at it that it will start to hurt the search business in the same way as email spam has hurt email.

As mentioned before, there are two ways to get exposure in search engine results: paid for placement and natural search results. With paid search estimated a 2.2 billion dollar industry last year, there is a lot of incentive for a lot of clever (yet maybe a little immoral) engineers to try their best to rank high in the natural results. The natural results are a lot more likely to get clicked and therefore more valuable than the paid ones – anybody want to estimate a number?

And the evidence is out there. Even Google sometimes gives me results that are obviously there for the “wrong reason”. Other engines do it more frequently. And even with the “non-spam” results there has been talk recently about the majority of search results being commercial – even in fields where you would expect to get both commercial and non-commercial results.

The paid for placement results are of course commercial by nature – no problem there as long as they are clearly marked as such. The natural results on the other hand, should reflect the best information on the web – commercial or not. The problem is that commercial sites are usually made by professionals and by now, most of them pay – at least some – attention to optimization that will help them getting higher ranking in the search engines. Not necessarily the bad tricks, but just the good ones. And therefore the majority of the natural results is and will be commercial too.

Search engine indexes, populated by machine bots – engineering marvels as they are – simply cannot make the needed distinction here. The ones that can tell what people will find to be good results are – you guessed it – other people.

Human information will become increasingly important in the fight against spam and to keep the search engine results free of commercial bias. Human information was what created Yahoo! in the beginning. This was also the brilliance of Google’s page rank and link analysis when they began – they were tapping into human information, links that people (webmasters) created were considered a “vote” and meta information about the page they linked too. This is what search engines saw as a major asset in the blogging community and this is why humanly created indexes like the ones that are constantly growing now at bookmarking services such as my Spurl.net, del.icio.us and LookSmart’s Furl will become major assets in the search industry in the coming months and years. And with a decent reputation / trust system (think Slashdot), it will be relatively easy to keep the spammers out – at least for a while.