Around 4 PM yesterday we started seeing increased error rates in some of our web clusters. We immediately put in an Amazon support ticket and contacted our Amazon monitoring/consulting company who was already aware there was a problem. The ECS service, which manages our Docker clusters, was constantly taking tasks up and down. This was causing some sites to periodically timeout. Our consulting company worked directly with Amazon to find a health check issue that was causing the up/down. Although the root cause was still unclear and masked by the somewhat chaotic mass of log errors, they found a DNS issue within the VPC which was caused by a security monitoring tool (Trend Micro Deep Security) that started blocking DNS requests. They fixed the issue and things were fully back around 8 PM. No lead systems were down that entire period, although there were times periodically when no leads or traffic of any kind were reaching the systems. This event affected some, but not all, of our clients in Amazon.