On 26 May 2022 between 16:33 - 16:43 UTC, two Oracle DAO nodes (ODAO), operated by the Rocket Pool team were compromised. ETH and RPL were stolen from the node accounts.
A team member’s workstation was hijacked using a remote execution exploit. Two unencrypted ssh keys were present on the team member’s workstation and allowed access to the two ODAO nodes.
The attacker gained access to the two ODAO node private keys and drained the accounts of funds.
This is an embarrassing and expensive lesson for the team and we are establishing improved systems and processes to harden operational security. Thankfully, due to the distributed design chosen for the ODAO, the protocol was not put under any risk.
It is also a stark reminder to our community to remain vigilant.
The first responder, while working with an open telemetry dashboard noticed that two of the ODAO balances jumped up, then dropped down. On witnessing the balance irregularity, they decided to investigate immediately.
Concurrently, automated alerts fired warning that two of the ODAO node balances had dropped below an ETH threshold.
The first responder checked the ODAO node balances on Etherscan and confirmed that ETH and RPL had been removed from the nodes.
Detection of the incident was quite quick, both from a first responder and automated alerting perspective. A faster detection could be achieved by an intrusion detection system or detecting suspicious transactions. ODAO nodes perform particular transactions and so any other transaction is suspicious. In this case, it may have enabled detection before the ETH was transferred but it is unlikely to have prevented the impact. Ultimately, prevention is the key here.
On confirming the incident, the first responder immediately escalated to the incident manager. Together they triaged the incident, to identify classification but more specifically whether there was a threat to node operators and other ODAO members.
The threat appeared localised because only two of the team’s ODAO nodes were affected but proceeding with caution the incident was initially reported to the ODAO. Further evidence was gathered to determine whether node operators were under threat.
Containment measures were put in place to protect the remaining team ODAO nodes. Firewall rules were updated to deny all network connections.
Other team members, were raised and it became apparent that a team member’s workstation was compromised. Containment measures were put in place to isolate the workstation.
The ODAO was informed that the root cause had been discovered and that it was a isolated incident.
Public incident communication was drafted and sent.
A recovery plan was determined and the ODAO informed that instructions will follow.
There was a delay in putting in place some containment measures because of an unplanned internet outage that affected one of the team members. A backup connection was established but because of the security IP locking policy, containment was delayed. Due to the infrequency of these issues and the positive security benefit of IP restrictions the trade off is worth it.
A key contact list was available but reaching team members was hampered due to it being very early in the morning local time.
The Rocket Pool protocol was unaffected by this incident.
- No node operators were affected
- No other ODAO members were affected
- The protocol continued running perfectly
- The ODAO system requires a 51% consensus on its actions and so is robust under these sorts of isolated impact.
- In total ~14.75 ETH (of ETH and RPL) was stolen from the affected ODAO nodes.
- Two of the team’s ODAO nodes are now not functioning because the accounts are compromised (and so cannot be sent ETH)
The incident is still in recovery mode but the initial impact as been contained.
- The affected server keys have been rotated
- There is no evidence that other servers were exposed but, as a precaution, all other servers have had their keys rotated
- A proposal to kick the affected ODAO nodes has been submitted
- A plan is being executed to recover the RPL bonds from the affected ODAO nodes
Once the affected ODAO nodes have been replaced, we will consider the incident resolved - this will take a couple weeks due to how ODAO voting works.
All times are UTC
|17 May, 12:40||Team member retrieved the ODAO 1 and ODAO 3 ssh keys to diagnosis a late night urgent issue with ODAO consensus. The ODAO were not in consensus for balances and so deposits were not possible for a couple of hours. The ssh keys were retrieved from our 2FA protected shared password manager. Unfortunately, the keys were unencrypted by default (no password) and due to haste they did not encrypt the key or remove them once finished.|
|17 May, 12:49||The team member identified the ODAO consensus issue and resolved it quickly to restore ODAO consensus then worked with other ODAO members and the team to make sure it didn’t happen again.|
|Unknown||Team member’s workstation infected by remote exploit. The team member is extremely careful and restricts what they install but being a Windows workstation could have been a contributing factor.|
|27 May, 11:30||Team member noticed that their hard drive was at 100%, they closed some applications and it returned to normal. We believe they were scanning for keys.|
|27 May, 16:07-16:33||Attacker used the team member’s unlocked Metamask to swap the team member’s tokens for ETH and transferred to an attacker account.|
|27 May, 16:33||Attacker gained access to ODAO1 node using the unencrypted ssh key, extracted the private key from the wallet file and transferred its ETH balance to the attacker’s account.|
|27 May, 16:36||Attacker gained access to ODAO3 node using the unencrypted ssh key, extracted the private key from the wallet file, and swapped its RPL and transfer its ETH balance to the attacker’s account.|
|27 May, 16:43||Attacker found a private key file on the team member’s workstation that was used for a personal bot and transferred its ETH balance to the attacker’s account.|
|27 May, 16:45||Attacker revisited the first Metamask account it drained to empty the last of the ETH.|
|27 May, 16:51||Attacker sold some of the ETH for BTC on an exchange|
|27 May, 16:54||Incident discovered - Automated alert warned that ETH balance on ODAO3 was below threshold|
|27 May, 16:59||Attacker sold the rest of the ETH for BTC on an exchange|
|27 May, 17:24||Incident investigated/escalated - First Responder checked the ODAO accounts and realised they had been emptied and immediately elevated to Incident Manager|
|27 May, 17:36||Automated alert warned that ETH balance on ODAO1 was below threshold|
|27 May, 18:00||Incident classified - as high; evidence suggested an isolated incident as no other ODAO members were affected. Did not want to rule anything out so gathered evidence to ensure node operators were not under threat.|
|27 May, 18:37||Incident reported to ODAO - Brought incident to ODAOs attention, we are investigating|
|27 May, 20:22||Incident containment applied - ODAO firewall rules updated, just in case it was a network based exploit|
|27 May, 20:57||Discovered team member machine compromised, reviewed repercussions|
|27 May, 21:18||ODAO updated on incident - Root cause discovered, no issue with smart node stack|
|27 May, 22:53||Rocket Pool Discord community informed of incident|
|27 May, 23:25||Formulated plan to kick compromised nodes and recover RPL bond|
|27 May, 23:29||Planned and started key rotation|
|27 May, 00:06||ODAO updated on incident - Will provide instructions soon about recovery plan|
|31 May||Incident review conducted|
|3 June||Incident post-mortem published|
|What happened?||ODAO accounts compromised and drained of ETH and RPL|
|Why did that happen?||Because the ODAO nodes are hot wallets and have to have the Ethereum keys available to perform their duties.|
|Why did that happen?||Because the ODAO node needs ETH to function and one of the ODAO node’s withdrawal addresses was not set. So claimed RPL was in the node’s hot wallet.|
|Why did that happen?||Because an intruder was able to ssh into two of the ODAO nodes|
|Why did that happen?||Because they were able to access an unencrypted ssh key|
|Why did that happen?||Because we stored unencrypted ssh keys on a privileged access workstation|
|Why was this a problem?||Because a remote access exploit was used to compromise a privileged access workstation|
|What can we learn from this?||Assume workstations can be compromised at any time|
|What can we learn from this?||SSH keys should have a password by default|
|What can we learn from this?||Always set the nodes withdrawal address|
|What can we learn from this?||2FA would have prevented the access|
|What can we learn from this?||Consider sandboxing development machines using encrypted VMs|
|What work could we do to make sure the incident does not happen again?||Roll out 2FA sign in to all ODAO nodes|
|What monitoring can we put in place to identify the issue sooner?||Monitoring was good: ETH balance alert fired, although could have been quicker|
|Are there proactive measures we can put in place?||ODAO protocol design: separate hot and cold wallet, improves recovery|
|Are there proactive measures we can put in place?||Expand the ODAO: introduce new members to further reduce risk|
- Telemetry and automated alerting was effective
- The team reacted quickly and worked effectively under immense pressure
- Incident management process preparation proved valuable
- Incident communication with the ODAO was prompt and continuous
- Initial communication with the community should have been sooner.
- Containment was hampered by unforeseen situation but it should have been more applied quicker
|Confirm withdrawal address set on all ODAO nodes||Yes||Yes|
|SSH key rotation||Yes||Yes|
|SSH keys encrypted (has password) by default in shared password manager||Yes||Yes|
|Apply 2FA on SSH to ODAO nodes||Yes||No|