Charged IT Solutions LLC - Outage - IRM AZS-1 – Incident details

Outage - IRM AZS-1

Resolved
Major outage
Started 17 days agoLasted about 1 hour

Affected

Iron Mountain - AZS-1 - Phoenix, AZ

Major outage from 7:19 PM to 8:14 PM

Updates
  • Postmortem
    Postmortem

    On August 2nd at 3:19 PM MST, customers experienced a complete loss of connectivity across all services. This occurred during routine BGP configuration changes intended to enable load balancing across new upstream circuits at our Phoenix datacenter location. The outage was caused by an unexpected interaction between our BGP routing configuration and internal OSPF route advertisements, which prevented proper route installation in our core routing tables.

    All customer services requiring connectivity outside our internal network were affected for 55 minutes (3:19 PM - 4:14 PM MST). No customer data was lost, and all services automatically resumed normal operation at 4:14 PM MST.

    To ensure this type of incident does not occur again, several proactive measures are being implemented. All future routing changes, even those not expected to cause impact, will be performed within reason during scheduled maintenance windows alongside naturally low traffic periods, with mandatory staging and pre-validation procedures. Enhanced monitoring for BGP session states and routing table changes with automated alerting is being deployed, along with formal change control processes for all core routing configuration changes.

    While we sincerely apologize for the service disruption, our commitment remains focused on continuously improving infrastructure. Following the resolution, routing improvements were completed, and our Phoenix location now has greater capacity and resilience for customer services. The preventive measures being implemented will significantly reduce the likelihood of similar issues occurring in the future. Thank you for your patience during this incident.

  • Resolved
    Resolved
    This incident has been resolved.
  • Identified
    Identified
    During routine BGP load balancing configuration changes to bring additional capacity online at our Phoenix location, an unexpected routing table convergence issue occurred. Our team is actively working to restore BGP route advertisements and expects full service restoration within an hour. No customer actions are required at this time. Updates will be posted regularly.
  • Investigating
    Investigating
    We are currently investigating this incident.