Friday, June 4, 2010

sqlclr: RegEx search in Sql Server

Even thought T-SQL is very efficient in processing set based data, it is very inefficient in executing many string based operations (eg extract strings from a text column based on search patterns). Unfortunately Sql Server doesn't have any native support for Regular Expressions to do this efficiently. Because of the .Net CLR integration, we can write a sql clr function to achieve this functionality. Here is a generic regular expression search function using RegEx that can be used to extract any pattern matched strings from any text column from Sql Server.
You can use the same logic to extract pattern matched strings from files, network shares etc.
1. define an IEnumerable static method with SqlFunction attribute.
2. define a GetRow method to return a single row for the tvf. Sql CLR will call this method to populate the tvf.
3. use a case-insensitive compare method to remove duplicates. You can either omit this method or use a DISTINCT clause in the select statement in Sql Server query.
[SqlFunction(FillRowMethodName = "GetRow", DataAccess = DataAccessKind.Read, TableDefinition = "found varchar(8000)")]
public static IEnumerable GetRegExSearch([SqlFacet(MaxSize = -1)]SqlString text, SqlString regExSearchFor)
{
List allFound = new List();
MatchCollection matchCollection = Regex.Matches(text.ToString(), regExSearchFor.ToString());
for (int i = 0; i < matchCollection.Count; ++i)
{
String found = matchCollection[i].Value.ToString();
if (FoundInList(found, allFound) == false)
allFound.Add(found);
}
return allFound;
}
public static void GetRow(Object obj, out String found)
{
found = obj.ToString();
}
I am implementing this functionality as a table value function (tvf) so that we can use this to join with other tables or tvfs.

To register this assemby in Sql Server, use the deploy option from Visual Studio or copy the dll to sql server and execute

CREATE ASSEMBLY SqlCLR
FROM 'C:\SqlCLR.dll'
GO
CREATE FUNCTION [dbo].[fxGetRegExSearch](@text [nvarchar](max), @regExSearchFor [nvarchar](4000))
RETURNS TABLE (
[found] [nvarchar](max) NULL
) WITH EXECUTE AS CALLER
AS
EXTERNAL NAME [SqlCLR].[Functions].[fxGetRegExSearch]

download code

-paul