HostVenom & WinterNode - MC.GRA3 - Hardware Failure – Incident details

MC.GRA3 - Hardware Failure

Resolved
Partial outage
Started about 1 month agoLasted about 13 hours

Affected

WinterNode

Operational from 6:46 PM to 8:07 AM

Gravelines Game Servers

Operational from 6:46 PM to 8:07 AM

mc.gra4.winternode.com

Operational from 6:46 PM to 8:07 AM

Updates
  • Resolved
    Resolved

    We have successfully completed the migration of customer data and databases from the GRA3 node to our new GRA4 node (although same provider to avoid IP changes).

    However, we want to be clear: this does not mean the GRA3 hardware issues have been resolved. In fact, during the migration process, GRA3 experienced three additional outages. For one of the "longer" incidents, our provider's interventions were limited to issuing a "hard reboot" - not the comprehensive hardware replacement we had requested. We share in your frustration and are deeply disappointed with our provider's handling of this situation.

    Key Details:

    • IP Address: We were able to migrate the IP block alongside your servers. As a result, your IP address and ports should remain unchanged.

    • Databases: If you are using our sql databases, please double-check your configurations as the machine IP has changed during the migration for GRA4. You will not be able to login/use the former database server on GRA3 as it has been turned off.

    Ongoing Monitoring:

    We will continue to closely monitor the environment for any further issues. Should any new concerns arise, we are prepared to escalate internally and externally as needed to ensure stability and performance.

    Compensation:

    We understand how disruptive this has been and sincerely apologize for the inconvenience. Impacted customers are encouraged to reach out to us to request an account credit for the downtime and related issues.

    Thank you for your patience and understanding as we worked to restore stability. We remain committed to providing better reliability moving forward.

  • Update
    Update

    Brief update: We're working on migrating customers to another node right now. We'll be powering down any servers currently online shortly.

    This is a hardware issue, and our provider dropped the ball on their previous statements to us, however, we're taking all action in our power to get our customers back on a stable machine ASAP.

    Impacted customers should reach out to us at any point to request an account credit for these issues.

  • Identified
    Identified

    We briefly experienced another unexpected outage at 7:51pm Pacific Time and our monitoring is now reporting an online status.

    An automatic intervention ticket has been opened and we have not received any updates from our previous tickets, we will keep everyone posted.

  • Monitoring
    Monitoring

    The node is back online.

    We are awaiting additional information from OVH. They appear to have performed testing instead of replacing hardware.

  • Identified
    Identified

    The server replacement should begin shortly.

  • Investigating
    Investigating

    Since migrating customers to new hardware, MC.GRA3 has experienced two reboots. Our upstream provided has scheduled an intervention to investigate and we've requested they replace the server, given the number of issues we've had in a short period of time. This should be commencing shortly, and we will keep you updated.