IRC Quote (2)

Published on 6 Mar 2009 irc algorithm Zeus bot

Zeus WPI has an IRC channel which at any one time contains two dozen geeks discussing a myriad of topics ranging from the latest XKCD to the physics behind not being able to reach absolute zero.

Such a cornucopia of madness wouldn’t be complete without a bot for certain administrative tasks such as keeping stats on each user.

The “riddle” I posted last year is a question we pondered about for a few days back then:

We keep a log-file of all that is said on our IRC channel. What’s the fastest way to extract one random line said by a specified person from that file, with every line having equal chance of being picked.

This we would use to implement a “quote”-command in our custom-made bot, which returns a quote for the named person.

To keep the problem interesting, no “persistent” data can be kept in memory over multiple queries, such as an index or a counter.

Adhemar was the only person to propose a solution, but we also asked our professor for Datastructures & Algorithms, Gunnar Brinkman. As it turns out Adhemar’s solution was very close to the one Prof. Brinkmann suggested.

Brinkmann’s Algorithm

This is the algorithm we were using:

totallines = 1
while not eof(logfile) do
   currentline = readline(logfile)
   if (rand() mod totallines) == 0 then
      currentqoute = currentline
   totallines++
done

In plain English:

For every line i, pick that line with chance 1/i.

Adhemar’s Algorithm

Adhemar’s solution however, is a tad faster on a real-life system because it does not need the relatively expensive mod-operation for every line:

currenthighest = 0
while not eof(logfile) do
   currentline = readline(logfile)
   currentrand = rand()
   if (currentrand >= currenthighest)
      currentquote = currentline
      currenthighest = currentrand
done

Or:

The player who rolls the highest dice gets picked.

Emperical data suggests the second algorithm is about 1% faster than the first. It’s obvious that this problem is an I/O-limited one, so these algorithms are probably as good as it gets without storing any data in memory.

Although the problem is relatively simple, the interesting thing to remember here is how to randomly pick an item from a set with an unknown amount of items.

David Verhasselt

I’m a consultant & entrepreneur. I build webapps and optimize Minimal Viable Products for clients all over the world. If you'd like to chat, hit me up on david@crowdway.com.

Like this? Sign up to get regular updates
David
Verhasselt