Friday, March 18, 2016

Extract Data Using Regular Expressions


Earlier, I wrote this article for extracting e-mail addresses from a memo field using regular expressions. What it was able to do is return all strings matching a particular pattern. Recently, I had a request to enhance the code to return all strings "inside" a particular pattern. See the discussion thread at UtterAccess. As a result, the following modified function takes a pattern and uses the Replace method to return only the string inside the "boundary" identified within the pattern.
Public Function ExtractData(strData As String, _
    Optional strDelim As String = ";") As String
'http://accessmvp.com/thedbguy
'2/9/2016
'Extracts data using regular expressions

Dim regEx As Object
Dim regExMatch As Object
Dim var As Variant
Dim strMatches As String

Set regEx = CreateObject("VBScript.RegExp")

With regEx
    .Global = True
    .IgnoreCase = True
    .Pattern = "\[(\w+)\]"
    Set regExMatch = .Execute(strData)
    For Each var In regExMatch
        strMatches = strMatches & strDelim & .Replace(var, "$1")
    Next
End With

ExtractData = Mid(strMatches, Len(strDelim) + 1)

Set regEx = Nothing
Set regExMatch = Nothing

End Function
To use the above function, make sure to replace the "pattern" identifying the boundaries and the string pattern you're trying to extract. For example, as discussed in the UtterAccess thread, the above function looks for a single word inside a pair of square brackets. You can also use the above function to extract the body of an HTML page by using the following pattern: "<body>(.+[\s\S]+)</body>".

I hope you find the above function useful. As usual, all feedback are welcome. Thank you!

No comments:

Post a Comment