There are many things that can throw a wrench in the mail delivery process. Before you start troubleshooting, you need to have a grasp of the actual problem, not just what was reported to you. Do not take the word of a non-technical person at face value when they tell you that ’email is down for everyone’. That can have a number of different meanings. You need to ask some questions before you start.
- What is the scope of the problem?
- How many people are affected? Almost as importantly, is there anyone who seems UNaffected and can still receive mail?
- Are users able to send mail between each other inside the company but not send or receive to/from people outside?
- When did it start?
- Are there any error messages or common symptoms that the affected users are seeing?
- Are people at outside companies getting any kind of bounceback message when trying to send email to addresses on the affected domain? See if you can have a copy of one of these bouncebacks forwarded to you if at all possible.
- What was changed? Besides the obvious, that it was working and is now not, something may have been changed. Ask anyone whom you know may have been working on the affected mail server or domain name within the last day or so. Changes to DNS records? firewall rules? spam filter device or spam filtering software on the server? etc. A lot of the time, finding out what was changed will point you toward the cause of your problem.
I would also say that if you are working on a problem for any given mail server or client, you should understand how their mail delivery is configured. If not, you should have someone on hand who does.
On to troubleshooting…
I generally like to take an ‘outside coming in’ approach. I start from the perspective of a mail server out on the Internet trying to deliver mail to the domain for which there is a problem and work my way to the destination mailbox. Here are some of the things that should be checked.
1. MX records. First, you should know what the MX records SHOULD be under normal circumstances. Then, you can use online tools such as MXToolbox or Hexillion.com to find out what the MX records are currently. If the primary MX record is ‘mail.domainname.com’, ping that address from outside the network that contains the affected mail server and see what IP address is resolved. Keep that IP address handy for the next step.
2. Check the firewall. Are there access and NAT rules in place to allow SMTP traffic to come through the firewall to the appropriate server? What is the external address of the mail server or spam filter as configured on the firewall? Does it match the IP address you found in step 1?
3. Is the server or spam filter listening on TCP port 25? From outside the network, run a “telnet <mail server external IP address> 25” command. Do you get a response? Keep in mind that firewall rules may only allow incoming SMTP connections (port 25) from specific IP addresses on the outside. Therefore, if this test fails, that doesn’t necessarily mean that you have found the problem. Try to telnet to port 25 on the server or spam filter from a computer on the same network to see if it responds.
4. Check the spam filter queue and logs. Oftentimes, a separate spam filtering device or server running spam filtering software will be the entry point for mail into your network. If you have already checked and verified that this device is at least accepting requests on port 25, now go look and see if there is a queue on it that is filling up with mail. In addition, check any logs which are available. Can you tell if this device is accepting, processing, then delivering mail to the destination Exchange/Sendmail/Postfix server?
5. Check SMTP queue on the mail server itself. If you have verified that mail is coming in past the firewall, past the spam filter, what is happening to it on the next step in its journey? Presumably, at this point, mail is going to a Hub Transport/SMTP or even a mailbox server, after passing through the spam filter. Look in the Queue Viewer (Exchange) or other SMTP logs. Are there messages stuck in a queue waiting to be delivered? If so, are there any specific error messages in the queue stating the reason for the problem? Look in the message tracking logs.
6. Check services/processes. Are the Microsoft Exchange services running, such as the Transport and/or SMTP services? Or if using Sendmail or Postfix, are the processes running? Sometimes, even if they are running, restarting the services/processes that deal with receiving mail can correct a problem.
7. Check logs in Windows/Linux for errors. For Exchange server itself, any diagnostically useful errors will be in the application log. However, keep in mind that Exchange (and mail flow in general) relies heavily on DNS functioning properly. So, you may have many errors that point to an Exchange problem, but it may just be a symptom of an underlying DNS or Active Directory issue.
8. Check the destination mailbox store (Exchange) or individual mailbox. Is the mailbox store online? Is the mailbox full and not able to accept mail? If you find that the mailbox store is offline, there is a whole other set of troubleshooting steps to deal with that problem!
Although this seems like a lot of things to go through, someone who really knows the mail delivery infrastructure for a domain/network can go through them all in about 20 – 30 minutes. Of course, depending on the answers to some of your pre-troubleshooting questions, you may be able to nail the problem more quickly than that.