Posts

Showing posts from June, 2009

How discoverable and usable is your PERL module?

I am working on using the IMAPI2 COM component in one of our products for an improved data burning experience. After exhaustively searching for anyone's C# port of the COM error codes laid out by Microsoft, I came up dry. There is a C++ header file included with the Windows SDK, but that's doesn't help me in C# where I wanted to use an enum (uint). There are 85+ error codes and I also wanted to easily link them to resource file descriptions of the error for localization purposes and it would mean a very tedious import of each enum as the lookup for the string, coupled with 85+ copy/pastes. Sound like a job for PERL? Well, I agree. I haven't used PERL for several months and saw this as an opportunity to pull out my Swiss Army Knife of programming. I am posting this to show how little code there is, and yet how long it took me to get it working (took my about 3 1/2 hours to get the script to run perfectly). Does 3 1/2 hours sound like an incredibly long amount o...

35x Improved T-SQL LevenShtein Distance Algorithm...at a cost

At work, we noticed a considerable performance hit using a T-SQL implementation of the Levenshtein algorithm created by Michael Gilleland found here . It seems to me that his approach was to replicate the C-code algorithm found in Wikipedia in T-SQL rather than taking a step back and re-conceptualizing the algorithm from a T-SQL standpoint. Feel free to correct me, but it see ms that the algorithm focues on three things to find the shortest distance between two strings (what it takes to make them the same): How many character insertions are necessary How many character deletions are necessary How many character substitutions are necessary These three traits are necessary to accommodate mis-aligned strings because of typos. If the algorithm is not designed with these constraints, one missing letter will result in inaccurate reporting because it throws the indexing off. The C-code example in Wikipedia uses a 2-D matrix approach, which is a natural, compact, and performant fit for ...