NB- This proposal is a draft, but open for comments
NB2- After @superphiz generously sent us some RPL without explanation, we took advantage of some heavy discounts for black friday and have started development. This proposal will be updated to reflect a larger reimbursement in the future, should we wind up in the red
While there isn’t an official rubric from the GMC yet, I noticed that Joe posted one recently, and I’ve been itching to formalize a proposal for the Rescue Node, as more and more node operators have benefited from it, but it continues to operate on the basis poupas’s good will.
Please find the proposed Tech Spec here:
Find some prior discussion here:
Below, I’ll lay out my responses to Joe’s questions, and will update as the rubric evolves.
- What is the work being proposed?
We propose the construction of a custom reverse proxy which, when coupled with a CC/EC pair for each CC client type, can be used as a fallback node by Node Operators in emergencies or during maintenance.
The above tech spec addresses many of the immediate problems: centralization risks, trust assumptions and mev/tip theft risks, and DDoS/misuse protections.
- Is there any related work this builds off of?
@poupas has a functional proof-of-concept which has been running since the merge. This proposal would flesh it out and address some of the limitations of the existing rescue node.
- How does this help people looking to stake ETH for rETH?
A theoretical pool staker with concerns about the performance of a non-professional staker may have their doubts assuaged by the knowledge that they have recourse for outages in the form of the rescue node.
- How does this help rETH holders?
The rescue node provides additional financial security for rETH holders by improving the overall performance of the token. Simply, the rescue node facilitates better uptime for node operators, and since penalties for missed attestations are socialized across rETH holders as well as the node operator, there is a direct financial benefit to rETH holders.
Cost/benefit analysis will follow below.
- How does this help people looking to run a Rocket Pool node for the first time?
The main benefit is in the form of added confidence. A potential node operator who is concerned about their ability to administer their own node may, similar to the theoretical pool staker above, find assurance in the availability of the rescue node to cover “worst-case” scenarios.
- How does this help people already running a Rocket Pool node?
Following the merge, certain client combinations popular with Rocket Pool node operators became unreliable, and many node operators used the rescue node to either wait for bugfixes from client teams or change clients and resync without suffering downtime.
Since then, several more node operators have used the rescue node to resync after their chaindata became corrupt, or to switch clients due to ongoing reliability issues.
- How does this help the Rocket Pool community?
Beyond the benefits listed above, the rescue node would be an exemplar of community members building services and tools to help one another.
- Who is doing the work?
The custom software will be entirely open source, with development led by willing and able rocket scientists. Code contributions will be welcome from anyone in the community.
For development itself, as the scope of the work is relatively small, a sum of 50 rpl on completion is a reasonable rate. I estimate the work to take a cumulative 20 person hours with most of the complexity focused on supporting prysm’s unique grpc transport.
The other category of costs that this grant proposal addresses will be for infrastructure.
- What is the breakdown of the proposed work, in terms of milestones and/or deadlines?
If the grant is approved, work will begin with development on the prater testnet, and should take 3-4 weeks, comfortably.
- How is the work being tested? Is testing included in the schedule?
The entire system will be built and stress tested on prater before being switched over to mainnet.
Testing will include attempted theft of mev and tips, smoothing pool and non-smoothing pool nodes, as well as nodes with associated solo validators (hybrid and reverse-hybrid mode operators)
- How will the work be maintained after delivery?
Maintenance will also fall under the purview of the rocket scientists who feel technical enough to contribute. I’d prefer for @ken to be the arbiter of which scientists have direct access, as he is a sort of (un?)official leader.
Rocket scientists will maintain the rescue node. Due to certain trust assumptions, the list of maintainers will be made public to the degree that they are willing to dox themselves, and a node operator opting to use the rescue node will be asked to explicitly agree to the trust assumptions. More on this below.
- What is the acceptance criteria?
Acceptance criteria are perhaps best described by the tech spec: Rocket Pool Rescue Node - Tech Spec · GitHub
- What is the proposed payment schedule for the grant?
The cost of infrastructure is directly related to the number of beacon clients required to provide full service. As of today, that list is Teku, Lighthouse and Prysm.
Because Teku only works with a Teku fallback, Lighthouse only works with Lighthouse or a Teku fallback if doppelganger detection is disabled, and Prysm only works with a Prysm fallback, at least 3 servers are required, each of which must be provisioned to run a full node.
When Nimbus adds so-called “split-mode”, a fourth may be required. As such, I propose funding this endeavor with enough RPL to cover the hosting costs of 4 nodes. Thumbing through OVH, I find that a serviceable server runs about $150 a month, bringing the cost of 4 to $600.
This would be $554 per inflation interval (28 days), or 35 RPL. To protect against price movement, and because my math so far has been back-of-envelope, I’d suggest allocating 50 RPL per interval, or 650 RPL per inflation period. Just enough RPL would be liquidated once a month to reimburse whoever paid the hosting bill, and any excess RPL would be retained for future costs or intervals where the RPL price no longer covers the cost.
After 1 year in service, I believe it makes sense for the GMC to reevaluate the grant and adjust the disbursement based on the balance carried by the maintainers.
Servers will be hosted on a monthly basis, allowing the GMC to pull the plug on the project (preferably providing advanced notice to the maintainers).
Additionally, after delivering the product described by the tech spec, I suggest that the developers be paid a lump sum of 50 RPL for their work.
Finally, I believe the GMC should reimburse @poupas for his costs maintaining the prototype, preferably after we have migrated to the full service rescue node, so that he can keep it active in the interim.
- How will the GMC verify that the work’s deliveries match the proposed cadence?
As development will be open source, I expect that we will be able to make regular progress reports to the GMC, and the GMC can audit the code at any time.
That’s all for Joe’s rubric. Below, as mentioned above, are some more details on the trust assumptions and a cost/benefit analysis.
The largest security concern with the rescue node stems from tip/mev theft. Please familiarize yourself with this article: Exploring Eth2: Stealing Inclusion Fees from Public Beacon Nodes | Symphonious
In general, any public-facing beacon node has a vulnerability on this front. A bad actor could query the rescue node to set the fee recipients to their own address. As such, the tech spec mitigates this issue by tracking the appropriate fee recipient for any given validator, and rejecting requests to change it. Rocket Pool has a unique advantage in this area, because a valid fee recipient for a given minipool is either the smoothing pool or a fee distributor contract whose address is well-known. However, two areas of trust remain:
- A node operator using the rescue node is explicitly trusting its maintainers to act in good faith. If one maintainer went rogue, they could feasibly steal tips/mev for any proposals submitted through the rescue node.
- A node operator using the rescue node who has solo validators, and decides to use the rescue node for their solo validators as well, must have mev-boost enabled. This is because the beacon node specification requires a signed message for the register_validator endpoint, which tells us that the owner of the validator is the one setting the fee recipient, whereas the prepare_beacon_proposer endpoint does not- non-mev-boost validators only call the latter.
Any node operator who uses the rescue node will be asked to acknowledge that they understand and agree to these risks, and additionally to agree that they will not share their rescue node credentials with any third party.
The main two beneficiaries of the rescue node are node operators and rETH holders, albeit in slightly different ways:
- rETH holders benefit by improved performance from node operators. The fewer attestations missed by the protocol, the better the APR.
- Node Operators benefit more directly- the fewer attestations they miss, the more commission they earn and the more rewards they earn on their own staked Ether.
A perhaps overly simplistic cost/benefit model would simply look at the total funds lost by the platform holistically for each missed attestation, and extrapolate to the number of validators such that the cost of the service is equal to the missed funds.
Based on today’s per-attestation reward of 14 Twei and today’s Ether price of $1200, an offline validator misses out on 1.7 cents in rewards per epoch, and incurs a penalty of 1.3 cents, for a total cost of 3 cents per epoch. Per day, an offline validator loses $6.75 (a bit over half in opportunity costs and a bit under half in penalties). The rescue node costs $20 per day at $600 monthly, so if an average of 3 validators (NB: minipools, not nodes) are using the rescue node, it breaks even holistically.
Now, there are other costs to being offline (missed proposals/mev/tips for example is an immeasurable cost, and I haven’t factored in sync committees, as they are quite rare), so the actual break-even is likely lower than 3 validators. Of course, if Ether appreciates or depreciates, the breakeven shifts, as the infrastructure costs are fixed.
At 650 RPL per inflation period (13 months), this would represent 1.6% of the yearly budget of the GMC.
- Unsure, I will ask follow up questions below