Thursday, February 25, 2016

Extract Email Address

This topic is really about Regular Expressions. If you do not know, Regular Expressions are text strings used as a search pattern for matching  a group of characters within a larger string. You might think of them as "wildcard" characters. However, they can do more than just match characters in a string because you can embed "rules" within the search pattern that the search engine will use to evaluate a match.

For example, if you use a wildcard character in an Access query, you might use something like this:
WHERE LastName Like "a*"
In this example, the "*" is a wildcard character that tells the database engine to match all records with last names beginning with the letter "a." But what if you wanted to return all records with last names that begin with the letter "a" or the letter "b?" You might use something like this:
WHERE LastName Like "a*" OR LastName Like "b*"
So far, so good. But what if you wanted all records with last names that begin with the letters "a" through "g?" You would have a very long WHERE clause if you continue with the above syntax.

Enter Regular Expressions

So, to help with that kind of situation, Access allows us to use limited forms of regular expressions in queries. For example, to return records with last names that begin with the letters "a" through "g," we can use the following criteria:
WHERE LastName Like "[a-g]*"
Much simpler, right? However, like I said earlier, the Access' database engine only gives us limited regular expression capabilities. To take full advantage of regular expressions, we'll need to use VBA and an external regular expression engine. Fortunately, there is a library that is available to us as a reference called "Microsoft VBScript Regular Expressions x.x"
To show how powerful regular expressions are, the following code example shows how it can be used to extract an email address from a Memo field. I used this example as a response to a question in the MSDN forums. The user wanted a way to extract the email address from a Memo field that contains data similar to the following:
Lastname: Doe
Firstname: John
Email address: john.doe@acme.com
Company: Acme Co.
First, a disclaimer: I don't claim to be an expert in regular expressions. In fact, I rarely use them. I just know that if the situation is right, they can be a very powerful tool to use. I would encourage you to learn more about regular expressons to enhance your applications.
Here's a code example that uses a regular expression object to look for an email address within a string.
Public Function ExtractEmailAddress(strData As String) As String
'http://accessmvp.com/thedbguy
'6/19/2015
'Extracts the first email address within a string
'regex email pattern source: http://www.regular-expressions.info/email.html

Dim regEx As Object
Dim regExMatch As Object

Set regEx = CreateObject("VBScript.RegExp")

With regEx
    .IgnoreCase = True
    .Pattern = "\b[0-9A-Z._%+-]+@[0-9A-Z.-]+.[A-Z]{2,4}\b"
    Set regExMatch = .Execute(strData)
    If regExMatch.Count > 0 Then
        ExtractEmailAddress = regExMatch(0).Value
    Else
        ExtractEmailAddress = "No email found!"
    End If
End With

Set regEx = Nothing
Set regExMatch = Nothing

End Function
Notes: The above code example uses late binding, and it will only return the first email address found, if any. To see a modified version of the above code that returns all email addresses found as a delimited string, please visit my website.

I hope this short article piques your interest into regular expressions, and perhaps, I will be learning a thing or two from you very soon. Thank you for reading.

Original post date: June 28, 2015

2 comments:

  1. Brilliant DBGuy

    I would say that this was picking up some additional characters for me at the end of some email strings.

    eg
    planning@westlothian.gov.uk we

    instead of
    planning@westlothian.gov.uk

    So I did some googling and found this regex pattern
    strPattern = "[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-zA-Z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"

    I swapped it into your function and that was giving me some issues before the ampersand so I swapped your expression infront of the ampersand and the other one at the back to get.

    .Pattern = "[0-9A-Z._%+-]+@(?:[a-z0-9](?:[a-zA-Z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"

    This for me seems to be working well. Not entirely sure if its poorly optimized or not.


    ReplyDelete
    Replies
    1. Hi,

      Glad to hear you found the function somewhat useful and thanks for giving us an updated regex pattern for other to try as well.

      Good luck with your project. - DBG

      Delete