Grabbing Email Addresses From Postfix Logs

January 27, 2019

In a recent incident, I was asked to provide a list of destination addresses being delivered to a particular mail server. Here's an example line from a #Postfix log:

Dec 29 05:00:51 mail01.test.example.com postfix/smtp[28704]: AF02145249: to=<emailtest.address@gmail.com>, relay=gmail-smtp-in.l.google.com[209.85.232.27]:25, delay=0.69, delays=0.01/0/0.16/0.52, dsn=2.0.0, status=sent (250 2.0.0 OK 1544545651 n189bb201234abc.123 - gsmtp)

From that log line, you can see that the #email address is surrounded by the angled brackets, “<” and “>”. Those brackets are preceded by the text “ to=”.

From this fantastic stackoverflow page I found some very useful grep commands for use in compiling part of this answer.

I decided to go with the fastest #grep answer, since I was dealing with multiple gigabytes of mail logs from the Postfix Mail Transfer Agent (MTA):

grep -Po ' to=<\K[^>]*'

Now I need to explain what each bit of that means. (Using the GNU Grep 3.3 manual)

-P Interpret the #regex as a Perl-compatible regular expression (a “PCRE”). This can also be written as --perl-regexp. PCREs are very powerful regexes that are used in a very wide variety of open source applications. Their syntax is covered in their documentation.
-o aka --only-matching. Only print the matching part of the content when you find a match, as opposed to the default behaviour of printing the entire line. This is useful because it means you don't have to then pipe your output to “cut”, “awk” or other tools to get what you want, but it does mean you need to be more precise with your matches.
' to=<\K[^>]*'
- to=< Matches a space, followed by the characters “t”, “o”, “=”, “<“.
- \K Report that the match starts here. In other words, you match with the characters “ to=<” but then discard those characters when printing the match. Again, a neat way to only show what you want, rather than having to pipe your output to another application to chop up the output.
- [^>]* This matches anything that isn't a “>” symbol, or in other words the match is ended when a “>” symbol is encountered. The square brackets are a character class and the caret (“^”) symbol negates that character class. The asterisk at the end means “match zero or more of the preceding thing”. (It's OK to end the match before the “>” symbol because in an email address, the domain name portion (that comes after the “@” symbol) may not contain that character.)

When ran against the log line shown at the beginning, you get the following output:

$ cat tmp2.txt | grep -Po ' to=<\K[^>]*'
emailtest.address@gmail.com

If you are interested in using regexes more, the RegExr site is a visual regex learning tool. You can use it to build up regexes slowly while understanding exactly what they do. You can also paste in a pre-existing regex to see if the site can describe what it does for you.