Assigning Categories with Bayesian Filtering

Published · 1min

Here’s a mad idea I had a few months back: if we can use bayesian filtering on spam, why not use it to automagically categorise entries? It’s not going to be foolproof, but it would be an interesting experiment, and it may just work! We wouldn’t need a ferociously sophisticated implementation, a simple naïve one would do. Once trained for any length of time, it would be able to work out what the entry’s about at least 95% of the time. Anybody willing to try the idea out? I might if I’ve time, though there’s no guarantee that I will.

Update: It’s after occurring to me that you could also use this to generate a list of closely-related entries. It might be overkill, but in places where you’ve got a large number of entries, where cycles are relatively inexpensive, and other methods might not be effective, it might be a useful way of relating them.