Mail.baby increase in false positive results

Incident Report for InterServer

Resolved

The new scanning cluster online and under testing. It will be slowly rolled out over the next 1 - 2 weeks if all goes well. With this change it will be quicker to add more resources to spam scanners quickly and scale up and down. Yesterday additional smtp + scanning systems were brought up for additional capacity but these types of system take longer to set up making auto scale more difficult. Additionally, mailbaby now checks email before the data command in smtp where previously every email was scanned even if it was known to be from an account that is a compromised account from a password disclosure, which will allow emails to be quickly blocked that we have 100% confidence are compromised.

Posted May 15, 2024 - 15:00 EDT

Identified

Service has been operating normally for over 12 hours. Today we will be turning up more scanning servers. Additional work will be done to allow us to scale easily.

Posted May 15, 2024 - 09:18 EDT

Update

As of right now the issue is 80 - 90% solved. That is, it is working as it has been prior to today, but the system will be future proofed to prevent this. The issue started less than 24 hours ago where the the rules for spam were growing, and our spam scanner was taking longer to process emails. This was not due to an increase in volume and was difficult to pinpoint. When mail is blocked, it is based on the score of our spam scanner. Some emails we know will be blocked but choose to scan them before blocking them. Since the emails are known that are compromised or should be blocked as of today, these will be dropped before being scanned via a new plugin that will pull in the data of compromised accounts that exists already. This will block spam quicker, with out starting any scan, even a prefilter scan. This though was not the full solution, nor the cause. Because it was difficult to fully pinpoint a commiter of rspamd was brought in to further identify the root cause problem, and found issues with bayes handling in rspamd 3.8, but not 3.7 or the development version. Once this was addressed and was pushed out, new email began to process with with less false positives. As there was more volume, additional scanning systems were brought online and added to our clusters. To further future proof mailbaby, we will be creating fall back scanning servers that can be spun up even quicker and be able to immediately scale up as needed a process that is beginning now.

Posted May 14, 2024 - 18:34 EDT

Update

While we continue to actively work on this, there is significant improvement in identification while we further look into the overall cause. The false positives have partially been identified as a problem with bayes / statistics. Fixes are being tested and placed on slowly to prevent the accidental passing of actual spam. To help with this, known blocked senders will not be dropped earlier in an smtp transaction as opposed to after a full scan of an email.

Posted May 14, 2024 - 14:40 EDT

Investigating

We are currently investigating this issue.

Posted May 14, 2024 - 09:10 EDT

This incident affected: Services (Mailbaby).