Dev Blog 4.1: Cache only, please.

Quick update to the previous post:

I decided to throw some actual data at the ternary tree tag dictionary to see how it performs compared to off-the-shelf solutions.

I started with 1000 random words and fed them to a standard Dictionary, a standard SortedDictionary and the Tag dictionary. I then timed 1000 successful lookups in each.

Unsurprisingly the standard Dictionary (which uses a hash-table for O(1) performance) was the fastest, taking just 0.00005197 seconds to find all 1000 entries. One option I'd been considering for finding tags was to just do lots of searches of a standard dictionary using different substrings of the user's text; this initial result suggests that might not be such a crazy idea; we're a long way off consuming even one of the sixteen milliseconds available for smooth interactivity. The Tag dictionary was going to have to really pull some Gs to compete.

The standard SortedDictionary (which can be customised to do the fuzzy-lookup I need) was, by comparison, awful, taking 0.00512 seconds. That might not sound like much, but in a large document the number of searches that could get triggered by the right kind of edit could get huge.

So how would the Tag dictionary fare? It has no hash table and is basically a binary tree with a third leg. Surprisingly well, it turns out.

1000 successful look-ups took just 0.000155 seconds - only three times slower than the standard Dictionary. 1000 worst-case unsuccessful ones (all but the last letter matching) clocked in at 0.00017.

That's pretty good considering that it's performing fuzzy matching (ignoring spaces) and keeps looking for more matches after the first (unlike either of the other tests).

So, this answers an important question: with 1000 tags defined I could only do three standard dictionary searches for each fuzzy Tag dictionary lookup - nowhere near enough to achieve the same results.

On the other hand, the Tag dictionary's O(log(n)) is going to scale much worse than the standard dictionary's O(1) with higher numbers of tags. There may well come a tipping point - but will anyone ever actually define even 1000 tags?

Time will tell, but I suspect not. The nature of the beast, I think, is a relatively modest dictionary size getting absolutely hammered with fuzzy-lookup requests - and I think what I have is a good fit.

Comments

Popular Posts