Rescue Node Grant Proposal

NB- This proposal is a draft, but open for comments
NB2- After @superphiz generously sent us some RPL without explanation, we took advantage of some heavy discounts for black friday and have started development. This proposal will be updated to reflect a larger reimbursement in the future, should we wind up in the red

While there isn’t an official rubric from the GMC yet, I noticed that Joe posted one recently, and I’ve been itching to formalize a proposal for the Rescue Node, as more and more node operators have benefited from it, but it continues to operate on the basis poupas’s good will.

Please find the proposed Tech Spec here:

Find some prior discussion here:

Below, I’ll lay out my responses to Joe’s questions, and will update as the rubric evolves.


=== Project ===

- What is the work being proposed?

We propose the construction of a custom reverse proxy which, when coupled with a CC/EC pair for each CC client type, can be used as a fallback node by Node Operators in emergencies or during maintenance.

The above tech spec addresses many of the immediate problems: centralization risks, trust assumptions and mev/tip theft risks, and DDoS/misuse protections.

- Is there any related work this builds off of?

@poupas has a functional proof-of-concept which has been running since the merge. This proposal would flesh it out and address some of the limitations of the existing rescue node.

=== Benefits ===

- How does this help people looking to stake ETH for rETH?

A theoretical pool staker with concerns about the performance of a non-professional staker may have their doubts assuaged by the knowledge that they have recourse for outages in the form of the rescue node.

- How does this help rETH holders?

The rescue node provides additional financial security for rETH holders by improving the overall performance of the token. Simply, the rescue node facilitates better uptime for node operators, and since penalties for missed attestations are socialized across rETH holders as well as the node operator, there is a direct financial benefit to rETH holders.

Cost/benefit analysis will follow below.

- How does this help people looking to run a Rocket Pool node for the first time?

The main benefit is in the form of added confidence. A potential node operator who is concerned about their ability to administer their own node may, similar to the theoretical pool staker above, find assurance in the availability of the rescue node to cover “worst-case” scenarios.

- How does this help people already running a Rocket Pool node?

Following the merge, certain client combinations popular with Rocket Pool node operators became unreliable, and many node operators used the rescue node to either wait for bugfixes from client teams or change clients and resync without suffering downtime.

Since then, several more node operators have used the rescue node to resync after their chaindata became corrupt, or to switch clients due to ongoing reliability issues.

- How does this help the Rocket Pool community?

Beyond the benefits listed above, the rescue node would be an exemplar of community members building services and tools to help one another.

=== Team ===

- Who is doing the work?

The custom software will be entirely open source, with development led by willing and able rocket scientists. Code contributions will be welcome from anyone in the community.

For development itself, as the scope of the work is relatively small, a sum of 50 rpl on completion is a reasonable rate. I estimate the work to take a cumulative 20 person hours with most of the complexity focused on supporting prysm’s unique grpc transport.

The other category of costs that this grant proposal addresses will be for infrastructure.

- What is the breakdown of the proposed work, in terms of milestones and/or deadlines?

If the grant is approved, work will begin with development on the prater testnet, and should take 3-4 weeks, comfortably.

- How is the work being tested? Is testing included in the schedule?

The entire system will be built and stress tested on prater before being switched over to mainnet.
Testing will include attempted theft of mev and tips, smoothing pool and non-smoothing pool nodes, as well as nodes with associated solo validators (hybrid and reverse-hybrid mode operators)

- How will the work be maintained after delivery?

Maintenance will also fall under the purview of the rocket scientists who feel technical enough to contribute. I’d prefer for @ken to be the arbiter of which scientists have direct access, as he is a sort of (un?)official leader.

Rocket scientists will maintain the rescue node. Due to certain trust assumptions, the list of maintainers will be made public to the degree that they are willing to dox themselves, and a node operator opting to use the rescue node will be asked to explicitly agree to the trust assumptions. More on this below.

=== Payment and Verification ===

- What is the acceptance criteria?

Acceptance criteria are perhaps best described by the tech spec: Rocket Pool Rescue Node - Tech Spec · GitHub

- What is the proposed payment schedule for the grant?

The cost of infrastructure is directly related to the number of beacon clients required to provide full service. As of today, that list is Teku, Lighthouse and Prysm.

Because Teku only works with a Teku fallback, Lighthouse only works with Lighthouse or a Teku fallback if doppelganger detection is disabled, and Prysm only works with a Prysm fallback, at least 3 servers are required, each of which must be provisioned to run a full node.

When Nimbus adds so-called “split-mode”, a fourth may be required. As such, I propose funding this endeavor with enough RPL to cover the hosting costs of 4 nodes. Thumbing through OVH, I find that a serviceable server runs about $150 a month, bringing the cost of 4 to $600.

This would be $554 per inflation interval (28 days), or 35 RPL. To protect against price movement, and because my math so far has been back-of-envelope, I’d suggest allocating 50 RPL per interval, or 650 RPL per inflation period. Just enough RPL would be liquidated once a month to reimburse whoever paid the hosting bill, and any excess RPL would be retained for future costs or intervals where the RPL price no longer covers the cost.

After 1 year in service, I believe it makes sense for the GMC to reevaluate the grant and adjust the disbursement based on the balance carried by the maintainers.

Servers will be hosted on a monthly basis, allowing the GMC to pull the plug on the project (preferably providing advanced notice to the maintainers).

Additionally, after delivering the product described by the tech spec, I suggest that the developers be paid a lump sum of 50 RPL for their work.

Finally, I believe the GMC should reimburse @poupas for his costs maintaining the prototype, preferably after we have migrated to the full service rescue node, so that he can keep it active in the interim.

- How will the GMC verify that the work’s deliveries match the proposed cadence?

As development will be open source, I expect that we will be able to make regular progress reports to the GMC, and the GMC can audit the code at any time.


Addenda

That’s all for Joe’s rubric. Below, as mentioned above, are some more details on the trust assumptions and a cost/benefit analysis.

Trust Assumptions

The largest security concern with the rescue node stems from tip/mev theft. Please familiarize yourself with this article: Exploring Eth2: Stealing Inclusion Fees from Public Beacon Nodes | Symphonious

In general, any public-facing beacon node has a vulnerability on this front. A bad actor could query the rescue node to set the fee recipients to their own address. As such, the tech spec mitigates this issue by tracking the appropriate fee recipient for any given validator, and rejecting requests to change it. Rocket Pool has a unique advantage in this area, because a valid fee recipient for a given minipool is either the smoothing pool or a fee distributor contract whose address is well-known. However, two areas of trust remain:

  1. A node operator using the rescue node is explicitly trusting its maintainers to act in good faith. If one maintainer went rogue, they could feasibly steal tips/mev for any proposals submitted through the rescue node.
  2. A node operator using the rescue node who has solo validators, and decides to use the rescue node for their solo validators as well, must have mev-boost enabled. This is because the beacon node specification requires a signed message for the register_validator endpoint, which tells us that the owner of the validator is the one setting the fee recipient, whereas the prepare_beacon_proposer endpoint does not- non-mev-boost validators only call the latter.

Any node operator who uses the rescue node will be asked to acknowledge that they understand and agree to these risks, and additionally to agree that they will not share their rescue node credentials with any third party.

Cost/Benefit Analysis

The main two beneficiaries of the rescue node are node operators and rETH holders, albeit in slightly different ways:

  1. rETH holders benefit by improved performance from node operators. The fewer attestations missed by the protocol, the better the APR.
  2. Node Operators benefit more directly- the fewer attestations they miss, the more commission they earn and the more rewards they earn on their own staked Ether.

A perhaps overly simplistic cost/benefit model would simply look at the total funds lost by the platform holistically for each missed attestation, and extrapolate to the number of validators such that the cost of the service is equal to the missed funds.

Based on today’s per-attestation reward of 14 Twei and today’s Ether price of $1200, an offline validator misses out on 1.7 cents in rewards per epoch, and incurs a penalty of 1.3 cents, for a total cost of 3 cents per epoch. Per day, an offline validator loses $6.75 (a bit over half in opportunity costs and a bit under half in penalties). The rescue node costs $20 per day at $600 monthly, so if an average of 3 validators (NB: minipools, not nodes) are using the rescue node, it breaks even holistically.

Now, there are other costs to being offline (missed proposals/mev/tips for example is an immeasurable cost, and I haven’t factored in sync committees, as they are quite rare), so the actual break-even is likely lower than 3 validators. Of course, if Ether appreciates or depreciates, the breakeven shifts, as the infrastructure costs are fixed.

At 650 RPL per inflation period (13 months), this would represent 1.6% of the yearly budget of the GMC.

Signalling Poll: Do you support the execution of this grant proposal?

  • Yes
  • No
  • Unsure, I will ask follow up questions below

0 voters

I have been using the current node provided by Poupas. Extremely useful when i swapped EC clients.

I approve (to the limited extent of my technical knowledge). My thoughts:

1. I strongly support retroactive funding for @poupas

Obviously reimburse the cost of running the node, but also include a reward for taking the risk. this is the kind of community service and ingenuity that makes rocket pool a best-in-market product, and the ability to run prototypes before approved roll-out is invaluable.

2. I think the costs should be defrayed by the people using the rescue node

A) 1.6% is a fairly large amount in perpetuity, and optimally one-time grants should give way to sustainable products. B) The product won’t be used equally by all NOs; many have their own backups, and small holders (1 minipool) probably won’t go through the time and risk to save <$20. C) a small fee will decrease overuse of the system, while still making it very EV positive to use the backup in times of need.

I suggest an expected (not required) contribution to the pDAO treasury (EDIT: of) based on the inflationary rewards of a minimum collateral minipool (this fee would be currently ~0.03 RPL per day per minipool) for NOs using the service. This will not penalize folks with more RPL staked, but recognizes that while the NO is using this public good, he/she is not actually doing all the things NOs get inflationary RPL rewards to do (the Etherscan argument). It is likely with a small fee the program may be budget neutral.

  1. Off topic, but :sob: your assumption that RPL price will not keep pace with inflation over a year. Surely the devs will do something.
reply to object_object

When I’m on the opposite side of a debate from object_object, I know I’ve really stepped in it. But here goes:

My proposal was intended to be a fixed rate per minipool (not node, although i see my wording was poorly chosen), so based on my arbitrary fee system in real terms napkin math:

NO with 1 minipool, min RPL: NO: gains 3.88$/day +execution, rETH gains 2.86$/day +execution, voluntary fee to NO 0.45$/day (from RPL)

NO with 1 minipool, min RPL: NO: gains 388$/day +execution, rETH gains 286$/day +execution, voluntary fee to NO 45$/day (from RPL)

To me, the question is if you want this to be a public good like sidewalks which are so fundamental to society and so hard to police that we just fund them centrally, or if you want this to be a public good like national parks or public transport, where the capital expenditure is centrally coordinated and usership is subsidized, but users are expected to pay some portion of ongoing costs to promote sustainability.
I think the grants committee should be funding initial development, but when possible projects should be self sustaining; otherwise you end up with 5 or 10 (terrific) projects with ongoing expenses that eat into future R&D. To me this particular project lends itself very well to sustainability, as the NO can see directly that EV is extremely positive (ie the benefit is not socialized). Anyhow, that’s my take.

I strongly disagree (this is my personal opinion, not representative of the GMC, which I’m a member of):

The current prototype has been used by both whales and single minipool NOs alike, so I don’t think your assertion B) is accurate.

1.6% is tiny compared to all the benefits it brings:

  • Benefits rETH holders directly by increasing the APR and reducing the number of missed attestations and proposals.
  • Benefits members of the smoothing pool directly by reducing the number of missed proposals, increasing the smoothie rewards.
  • Makes rETH more competitive by boosting our effectiveness stats, which are tracked on sites like rated.network.
  • Increases the appeal of becoming a Node Operator with Rocket Pool, which would be the only staking protocol providing this service.

Since this benefits almost all users of Rocket Pool, it doesn’t make sense to only charge the NO using it.
Asking for a fixed rate per node is especially bad: It would severely affect the EV of small NOs, who are more likely to use the rescue node, as it doesn’t make financial sense for them to set up their own fallback node.

The existence of MEV lottery blocks also makes charging for it a losing proposition: If we lose a single lottery proposal because the NO is offline and didn’t want to pay for the rescue node, rETH holders alone could lose more than the 650RPL the rescue node would’ve cost in an entire month.

I’ve made a few edits since posting, I figure I should summarize them here since it’s unreasonable to expect people to reread the whole dang thing:

  1. I was convinced that offering to develop it for free set a bad precedent. The work is small so the cost of development is small, even at a reasonable compensation, so I’ve added in a lump sum payout of 50 RPL on completion.

  2. Poupas and I are still spitballing on how to handle solo validators. It may come to pass that the MVP here supports only solo validators who use mev-boost, and we find a way to extend support to non-mev-boost-solo-validators in the future. Of course, solo validators will only be allowed if they are owned by RP node operators.

  3. I had previously overstated the risk to solo validators… the risk was only to non-mev-boost-enabled solo validators, and as I mentioned above, we’re probably adding support for them after the MVP is developed.

Based on Patches’ proposal, this would seem to be a no-brainer. It has already proven its usefulness during the Merge and will continue to do so going forward, I believe.

IMO, this service will also allow Rocket pool to stand out from the crowd of Staking protocols; if I were a prospective staker going over the various choices, this is the sort of detail that would significantly increase my confidence to take the plunge with Rocket pool.

In other words, it would pay for itself multiple times over just for the marketing value of it!

1 Like

I do like the The Met model, and think it could get us far in this case. I think users are grateful and many would contribute gladly.

  • No required fee
  • No policing
  • Very clear suggested donation (x per day, x per pool per day, x% of the earnings it saved you); we should spoon-feed how much the total suggestion is

edit: @Patches corrected me twice from the MET to the Met to the The MET, to refer to the Metropolitan Museum of Art

“ Node Operators who do not agree with the relays selected by the maintainers have the option of disabling mev-boost.”

Potentially worrisome. As is the general “MEV is optional” nature of Rocketpool, though. If the intent is to be competitive in APY, then running MEV in some form is not optional.

Agreed that it needs to be limited to emergencies or it becomes a centralizing force. I’d love to see proposed usage rates / lockout times for this, and the way those were arrived at. Apologies if that’s in the MD and I missed it.

With regards to MEV boost, we’ll do our best to mirror whatever requirements are set forth by the protocol. We can’t force a VC to enable blind block signing, though, so we have limited control. I’m not sure there’s a way for us to require MEV boost for the rescue node, but if there is, and the protocol comes to require it, so will we. I’ll update the proposal to clarify.

In terms of rate limiting, my thought has generally been to have a grace period long enough to order and replace hardware or resync a client (perhpas 10-15 days), and a timeout at least as long again. I’m open to feedback here. I’d like to strike a balance between protecting the decentralization of rocket pool and preserving pool staker funds from penalties.

Maybe a quick poll will help decide:

  • A node operator relying 100% on the rescue node should still operate at a net loss
  • … should operate at net neutral
  • … should operate at diminished profits

0 voters