Regarding the Hosting of the Rewards Merkle Tree

jcrtp · 16 April 2022 02:01

Thank you everyone for the great discussion so far. The feedback here has been impressively diverse, but I would expect nothing less from a protocol that maximizes decentralization! Let me try to recap what I’ve seen so far below:

Some people really like the maximum decentralization route (Solution B), but the archive node limitation precludes it from being the only solution (unless that user runs an EC such as Erigon or Besu which don’t have issues with archival state).
Some people don’t see what the big deal is, and just suggest we throw the oDAO-generated trees on GitHub or something. After all, we host the Smartnode installer + binaries on GitHub, so we’re already trusting it as a centralized service.
Some people think hosting it on a decentralized file storage platform is the best way to go. IPFS is still in the running, but anyone pinning it has to be comfortable exposing their IP address (which effectively means running the IPFS node offsite or configuring a VPN service).

Based on all of this, here are my thoughts:

Every node operator should have the ability to generate the rewards tree for an interval from scratch if they so choose. The node operator will be responsible for having access to an archive node, which may or may not be their local EC for validation duties (e.g., they could specify a free-tier Alchemy endpoint as their archive node purely for the purposes of generating the tree). For the Smoothing Pool calculation, they could still use their own Beacon Node.
- This is roughly analogous to building the Smartnode from source. The source is there, anyone can do it (and some people do!) but most users just use the stuff I upload to GitHub.
Because of the archive node requirement, and because a fair number of people don’t care about this feature, this would be opt-in rather than opt-out. The default behavior would be to acquire the oDAO-generated tree for each interval from a third-party source (discussed in the next point). However, I will allow node operators to disable this and add a function to generate the tree for an interval from scratch and save it. The Merkle root would then be compared programmatically with the canonical root stored in the contracts for consistency. It has the same effect as Solution B in this respect for people that want to use it.
The oDAO will also host their generated trees on a decentralized storage platform. It’s important that we make it decentralized because we don’t want to rely exclusively on the development team to host the files (e.g. GitHub or the Rocket Pool website); in the event that we are unavailable, the oDAO should still be able to push the file and users should still be able to acquire it. A lot of suggestions have been provided here; the one I am considering most right now is https://web3.storage for the following reasons:
- It fits nicely within the current Smartnode stack (it has Go bindings)
- oDAO members could have their own independent API keys and upload the trees they generated (How to store data using Web3.Storage | Web3.Storage Documentation)
- It uses IPFS as a backend with files stored on FileCoin servers for high-availability
- It’s currently free to use, and our files are small enough that I’m not overly concerned with pricing in the near future if that were to change
- @superphiz (one of our oDAO members) highly endorses it
We can host redundant copies on a dedicated “rewards tree” GitHub repository and probably even in a dedicated Discord thread. I’ll bake the repository into the Smartnode so if it can’t reach the IPFS gateway, it can use that as a backup.
For Node Operators, the Smartnode would download the tree from the web3.storage gateway automatically (How to retrieve data from Web3.Storage | Web3.Storage Documentation) as a primary source, and from the GitHub repo as a backup source if the gateway is offline. If both of them fail, the node operator can still get it manually via any other IPFS endpoint (or run an IPFS node themselves, or ask someone for it on Discord… the list goes on). Since it’s a Merkle tree, any attempts to modify it will conflict with the Merkle root stored in the contracts anyway, so it’s not like someone can forge a malicious tree and distribute it.
I’m still on the fence about adding an IPFS node into the Smartnode stack because of this whole IP exposure thing. If I do it, it will be under the conditions that it will automatically pin all of the rewards files and use the node as the primary method for downloading the rewards tree. It’ll have to come with some good explanations about the IP problem so the node op can make an informed decision.

Whatever the case, I think this approach will make everyone happy. Address exposure becomes an opt-in thing (even for the oDAO). Anonymity champions can generate it from scratch (provided they have the capability) or get it from somewhere that doesn’t expose the node’s IP like the GitHub redundant copies or the Discord thread. People that don’t see what the problem is will automatically and transparently get it from one of the hosted sources without a second thought. I think it’s the best UX for everyone.

Are you on board with this plan, or are there any concerns with it?

How do you feel about point #6 above?

dEEtoo · 16 April 2022 02:18

This is a well-thought-out, balanced approach with safety+redundancies built in. I like it, i like it a lot.

Mig21 · 16 April 2022 07:50

i like the plan, IMHO i don’t like the IPFS node add-on , even some ISP block IPFS node IPs,
there is the risk that some NO will be blacklisted.

an added concern for new NO.

torfbolt · 16 April 2022 10:37

Sounds good, and I would also argue against including the IPFS node, since it’s not a core requirement with this proposal. Also, every additional component to the stack consumes resources, bandwidth and enlarges the security cross-section.

fornax · 16 April 2022 13:27

I believe this is a solid plan, making data at the same time widely available AND verifiable.

I also think including an IPFS node into the stack should be avoided (more resources needed and a higher attack surface).

mario · 16 April 2022 17:36

Adding a vote for (E) - do nothing.

The discussion rests on a very large assumption that I need to know your IP address in order to target you. This isn’t the case – sure, it might make my job easier, but anyone who poses a real threat already has the tools and knowhow to find you.

For example: It’s straightforward to modify a client to record IPs and churn through peers until you’ve mapped the entire ETH network. You can narrow down ETH2 clients to a subset of RP node operators by observing that RP blocks all include some “RP” graffiti. To find Watchtower IPs, connect to the ~6k ETH1 clients and make a note of the first ones to relay transactions on the ODAO contract. (These are just two attack vectors off the top of my head – people much smarter than me have certainly cooked up more efficient methods.)

Not to mention all the places your IP is already leaked alongside your wallet address – any site you’ve authorized with Metamask, including the RP website; RP Metrics Dashboard; Beaconcha.in; Etherscan; etc. A bad actor involved with any of these projects could tie your IP back to your wallet address.

All of this to say, I think this is a real problem but it’s not something RP can begin to fix because it’s fundamentally an ETH problem. Given that leaky IP addresses already exist and will continue to exist regardless of what we do, I don’t think it’s worth worrying about because the most we can achieve is “security through obscurity.”

Egk10 · 18 April 2022 19:45

I’m not a dev and have no cyber security expertise. Running RP Nodes and Solo Staking at home as an Ethereum enthusiastic investment. Reading those posts scares me a lot. I’m pretty sure that i have accidentally exposed my IP eventually. What could i do to increase my security?

connerjason · 19 April 2022 08:27

Sorry to come in late, what is the actual size estimation? Are we actually at several kb? Will this file size grow over time?

Ilu · 19 April 2022 08:29

Good approach, but i would not add the IPFS node into the smartnode stack, too. Not needed, because either concerned users would generate the tree themselves or connect to github (or another service).

mao · 19 April 2022 10:50

im on board with everything ecxept adding the IPFS node to the smartnode stack.

Uninformed use can expose and harm the Nodeoperator, many people wont know/realize the concenquences so i wouldnt add that. Best way to prevent NOs from getting burned is to not expose them to fire in the first place.

jcrtp · 20 April 2022 00:56

Ok, it sounds like people are generally happy with this approach, and no one seems to see the need for an integrated IPFS client so I will remove it. I will start working on a full write-up of the new Merkle rewards system now that we have these details ironed out. Thanks for your input everyone!

jcrtp · 20 April 2022 01:09

Each checkpoint gets its own file. The one I generated for Mainnet as an example was about 850 KB.

dkderek · 22 April 2022 16:35

I’m on board with this solution and don’t feel an IPFS node needs to be added to the smartnode stack. Excellent write-up jcrtp.