Regarding the Hosting of the Rewards Merkle Tree

jcrtp · 13 April 2022 06:09

This post is meant to describe an interesting issue we encountered when implementing the new Merkle Tree-based rewards system (“Garlic Bread”) and start a discussion among our experienced community members about solutions.

Original Design

Originally, the rewards system was designed as follows:

At each rewards interval, the Oracle DAO nodes would generate a Merkle Tree that enumerated the total RPL and total ETH earnings (for the Smoothing Pool) of each Rocket Pool node.
This tree would be uploaded independently by the Oracle DAO nodes to IPFS; as each node would generate the same tree (and thus the same hash), this could be done without any coordination. IPFS would be baked into the Smartnode stack as an extra capability.
The Oracle DAO nodes would upload the Merkle Root they arrived at to the Rocket Pool smart contracts.
Once 51% of them voted on the same Merkle Root, it would be canonized as the official root of the tree for that rewards period.
Node Operators would then pull this file from IPFS once available and use it to claim their rewards.
Optionally, Node Operators could run IPFS instances as well to host those files and contribute to the decentralization of the rewards system.

The Problem

Upon experimenting with an initial implementation of this design, I ran into what I believe is a major issue with the design: IPFS is not anonymous. In other words, it is relatively easy to retrieve the IP address of anybody hosting (“pinning”) a particular file.

As the Oracle DAO members would be the ones that originally host the Rewards Merkle Tree per period, they will be the only ones with access to that file at first. What this means in practical terms is that this design would make it trivial to find the IP addresses of the Oracle DAO members and, if opted into running an IPFS node for “rehosting” the file, the Rocket Pool node operators as well. This has obvious implications for censorship resistance and the potential for DDOS attacks on both the Oracle DAO and Node Operators alike.

Because of this, I submit that we need to look at other options or workarounds instead of following the original design.

Solution A: Require VPNs for all Oracle DAO Members / Rocket Pool Node Operators

The first solution is the “easiest” fix that keeps the original design intact. If the problem is finding the IP addresses of the Oracle DAO members, enforce some kind of obfuscation, such as a VPN service, that allows them to quickly mask and modify their IP addresses in case of attack.

The problem with this is that it doesn’t scale; malicious actors can consistently determine the new IP addresses to attack (since the nodes must necessarily pin the rewards files) and continue to oppress them every time one of these addresses is updated.

A related issue is how to handle regular Node Operators who want to rehost a file; they will either have to opt into a VPN provider (which will add some cost overhead, require additional documentation, and may hinder validation performance), or expose their IP to the world as a Rocket Pool node operator.

Solution B: Every Node Builds the Tree Independently

This option removes IPFS and dependence upon the Oracle DAO as the source of the Merkle Tree entirely. Instead, after a rewards checkpoint, every Rocket Pool member’s watchtower container will essentially run through the same process as the Oracle DAO nodes to generate the entire Merkle Tree from nothing but the chain data they already have for the Execution Layer (eth1) and the Beacon Chain (eth2).

Admittedly, this is the one we’ve spent the most time talking about but it’s not without its faults.

The advantages:

Every node operator knows they’re generating the tree from scratch; they don’t have to trust the tree that Oracle DAO generated (unless they arrive at a different Merkle Root, in which case we’ll have to have some kind of conflict resolution to determine why this happened)
No need to share files, so no exposing IP addresses
Relatively easy to implement, doesn’t add much extra development overhead

The disadvantages:

If the user’s Execution client goes offline for more than 128 blocks (~30 minutes) after a checkpoint is hit, they won’t be able to generate the rewards tree anymore since they no longer have the state data for the snapshot block. They would either need temporary access to some kind of archive node such as Alchemy, or would need to use a pre-generated file from somewhere else. Perhaps we could host the file the Oracle DAO produced on the Rocket Pool website as a backup for users in this situation?
Generating the Merkle Tree is computationally taxing when it comes to the Smoothing Pool, because it has to look at the complete attestation history of every node opted in for the entire interval. In the worst case scenario, this can take hours if active real-time caching isn’t implemented or is lost.

Solution C: Use an Anonymous File Hosting System

This solution is the same as the original design, but replaces IPFS with a file sharing protocol that preserves anonymity by design and doesn’t expose IP addresses. For this, we would have to investigate something like Freenet as an alternative.

I am not well-versed enough in any of those projects to discuss their merit, but I openly invite community members who are familiar with them to offer suggestions here.

Solution D: Use a Centralized Endpoint

In this solution, the Oracle DAO will create the Rewards Merkle Tree as expected but will only share them with a trusted, centralized endpoint which will host them. For example, this could be the Rocket Pool website or Infura’s own IPFS endpoint. Node Operators would then access this endpoint during a rewards claim (or perhaps it would be baked directly into the Smartnode).

While this is probably the easiest solution, it’s also the most fragile because it’s a centralized option. If that endpoint fails, users can no longer claim rewards unless there are redundant copies of the files hosted elsewhere. Trust of the data isn’t an issue because the Merkle Root is recorded on-chain (so you always know whether or not your tree file is accurate), but file management is and it sours the UX of rewards claiming if users have to go out of their way to find a “mirror” of the file.

Also, it should go without saying that Rocket Pool’s ethos is to maximize decentralization where possible, so this option should only be used as a last resort if all of the others cannot be adopted.

Solution E: Do Nothing

Maybe this isn’t actually a problem. Maybe having the IP addresses of the Oracle DAO nodes and/or Rocket Pool node operators isn’t a concerning factor. In that case, this is all moot and we should continue with the original design.

Discussion

With that, I will open this topic up for community discussion. Which option do you prefer and why? How do you plan to address the shortcomings? Is there an option I didn’t include that you want to present?

Let’s see what we can come up with!

cyberhorse · 13 April 2022 06:25

Would it be possible to have solution B combined with an option to download the most recent “checkpoint” Merkle tree from a centralised endpoint which will only grant access to a known operating node?

That way it is mostly decentralised but the central node can be used infrequently in an emergency.

jcrtp · 13 April 2022 06:28

Yes, I tried to capture that idea in one of the bullets of Solution B:

Perhaps we could host the file the Oracle DAO produced on the Rocket Pool website as a backup for users in this situation?

I’m sure we could do this (and host all of the Merkle Trees for each interval, not just the most recent one) as a public service for users to grab in case of emergency. The only thing is that it becomes a slippery slope leading to Solution D where people forego the generation process and just rely on our hosted copies, which isn’t really what we want to encourage because it becomes a central point of failure if used as a primary source.

cyberhorse · 13 April 2022 06:41

We are already centralised in terms of having access to the latest software build for instance. I don’t see the mostly decentralised + emergency backup as being much different. To me the most important aspect is to prevent casual access by outsiders to IP details for the entire network.

Could each NO have a merkle tree specific to their own rewards as opposed to one for the entire network? In other words they should really only know about their own reward history. The central facility if accessed by an NO would deliver only their own entitlement.

jcrtp · 13 April 2022 06:48

Could each NO have a merkle tree specific to their own rewards as opposed to one for the entire network? In other words they should really only know about their own reward history. The central facility if accessed by an NO would deliver only their own entitlement.

Unfortunately no, because each node will need the proof for their own node which requires the entire tree to derive. The RPL rewards aren’t actually hard to do - my Pi can build the entire thing for Mainnet in about 5 seconds. The hard part is the Smoothing Pool and calculating everyone’s effective uptime to determine their rewards.

NonFungibleYokem · 13 April 2022 06:53

I think there is probabably a solution in bootstrapping the distribution of the merkle tree via an explicitly privacy prioritized protocol - possibly a tor hidden service that points to a simple webserver. If the odao members were to publish a hidden service address on chain (maybe even resolvable by ens?) that the latest merkle tree could be found at, other node operators could opt into downloading that via tor, and then republish it onto ipfs for wider consumption. It’s something that could even be setup on very low cost vps servers separate from the node.

Alternately, the self production of the merkle tree could probably be more easily done for node operators running erigon in pruned mode or besu in its archive bonsai mode. The 128 block limit only really is a problem for geth users, and could be done on hardware separate from what’s hosting the node.

hitsuzen.eth · 13 April 2022 07:57

Could we use storage services like Aleph, Filecoin or SkyNet(sia)?

I recommend the use of Aleph, which would easily solve the problem.

pk3268 · 13 April 2022 08:28

I haven’t followed it lately, but perhaps Swarm would be an alternative to IPFS. It’s an ethereum native project.

Mig21 · 13 April 2022 10:21

a mix of option D and E could be OK?

ODAO nodes can upload the file to public gateways and solution D before sharing.

it’s possible that if a lot of IPs are hosting the files the risk is mitigated?

even mysterium (https://www.mysterium.network) could be useful

fornax · 13 April 2022 14:20

ODAO members could submit the file to any IPFS storage service like nft.storage or Pinata (or even both for more decentralization), that way their IP won’t be made public. Free tiers would probably be enough. After that, any user who wants to help could also pin the contents to improve decentralization/availability aspects.

We could even go with more than one solution. Have the merkle tree posted on IPFS but allow anyone interested to have the option to rebuild it and verify.

Ilu · 13 April 2022 14:33

Clearly Solution B with an optional solution D for people who can’t do B. This would be the most decentralized, private and trustless solution. And this is what this whole thing is all about, right ?

peteris · 13 April 2022 15:38

Solution E. I don’t think it’s a big deal. If an oDAO member or a node operator is worried about their privacy or DoS attacks they can set up VPN, move the node somehwere else, set up restrictive firewall rules or take other measures. If a node operator runs Rocket Pool at home they can simply not opt in.

One way oDAO members and node operators could protect themselves is to run an external IPFS node somewhere else. Smart node would then connect to an external IPFS address.

Another way that could be enabled when there is an attack would be to set up a public external node somewhere else and restrict the local IPFS node to only accept connections from their own external node. This way the content is available but the local IPFS node is not accessible to the world. There would be no changes needed to smart node for this. This would protect from DoS attacks, not sure if this leaks IPs or not.

knoshua · 13 April 2022 16:14

The protocol needs the oDAO to perform their duties or things could break. It’s not about oDAO privacy, but about protocol security. Anything that works without publishing oDAO IPs is fine.

Wander · 13 April 2022 17:40

This is true now, but will it always be true? If Ethereum sees the right technical upgrade, we could conceivably get rid of the oDAO one day, and I believe we should design with that future in mind.

Thinking long-term, I think everything except solution C is fragile to some extent. If C isn’t an option, I believe we should just do nothing for now and continue to watch the space.

As an aside, I think B as presented above is basically the same as D due to the requirement of centralized endpoints/hosting for broken nodes.

knoshua · 13 April 2022 17:47

We should design a rewards system that relies 100% on the oDAO with a future without oDAO in mind? I think that would be difficult. Are you suggesting to keep the current rewards system?

What exactly is the concern here outside of leaking oDAO ip addresses?

Wander · 13 April 2022 17:55

My concern is cementing the oDAO any more than necessary. The proposed new rewards system places the burden entirely upon the oDAO right now but can feasibly be transferred to NOs in the future (i.e. solution B).

The ideal solution is an iteration of B which preserves privacy and anti-fragility (maybe solution C?).

torfbolt · 13 April 2022 21:17

This. Using a public IPFS gateway or hosting the IPFS node on a small VPS is not more effort or cost than a VPN and mostly solves the attack vector issue. So I would go with the current solution and extend it with a configurable external IPFS node address.

mbs · 14 April 2022 00:35

I think it’s important that node operator’s ip addresses remain as anonymous as possible and preferably in the hands of no people or service. (B) seems like the only option.

Ilu · 14 April 2022 07:36

Solution B would also give node operators more responsibility and possibly prepare for future tasks which are similar to odao duties now.
Solutions with VPNs and external sources just add costs which are not necessary if everyone can check for themselves.
But i think there should be options for node operators which want to rely on a external source.

enkriptix · 14 April 2022 21:25

I’m also leaning toward option B with option D as a backup.