Originally posted on the Spurl.net forums on November 1st 2005
Spurl.net just launched an Icelandic search engine with the leading local portal / news site mbl.is. Although Iceland is small, mbl.is is a nice, mid-size portal with some 210,000 unique visitors per week – so this has significant exposure.
An Icelandic search engine may seem a little off topic for many of you Spurl.net users, but it is relevant for several reasons:
- We have been busy for some 4 months now, building the next version of Zniff. Embla uses this new code and this code will make its way into both Spurl.net (for searching your own spurls) and Zniff (for searching the rest of the Web) in the next weeks. The engine is now reliable, scalable, redundant and compliant with a lot of other .com buzzwords – with more relevant search results and subsecond response times on most queries.
- This commercial arrangement with mbl.is (hopefully the first of many similar) gives us nice financial footing for further development of Spurl.net and related products.
- Embla uses information from Spurl.net users. Even though only a small portion of our users are Icelandic, we’re using their information as one of the core elements for ranking the search results – and with good outcome. This strengthens our believe in the “human search engine” concept for other markets as well as in the international playing ground.
I plan to write more on the search engine itself on my blog soon, but just wanted to mention briefly that with Embla we’re also breaking new ground in another territory: Embla “knows” Icelandic. Most search engines and technologies are of English origin, and from a search standpoint, English is a very simple language. Most English words have only a couple of word forms (such as “house” and houses”) and while some search engines use stemming (at least sometimes), it doesn’t matter all that much for English. Many languages – including Icelandic – are far more complicated. Some words can – hypothetically – have more than 100 different word forms. In reality a noun will commonly have about 12-16 unique word forms. Now THIS really matters. The difference in the number of returned results is sometimes 6 to 10-fold, and it improves relevance as well.
We have built Embla so that it searches for all forms of the user’s search words. We also offer spelling corrections for Icelandic words, based on the same lexicon. The data for this comes from the Institute of Lexicography, University of Iceland, but the methodology and technology is ours all the way and can (and will) be used for other languages as well. Quite cool stuff actually – as mentioned, I will write more about that on the blog soon.