V1.6.1 Rewards Calculation Issue and Candidate Solutions

jcrtp · 13 September 2022 05:48

v1.6.1 Rewards Calculation Issue and Candidate Solutions

This post contains a post-mortem analysis of the issue in v1.6.1 and below that caused high CPU load after Redstone’s first rewards interval on September 1st, 2022.

It also contains an explanation of the short-term solution employed in v1.6.2, and an exploration of a few candidate options for a more robust medium-term solution.

Posts like this one generally go into the rocketpool-research repository.

However, because this contains a potential redesign with long-term ramifications, I would like to collect everyone’s feedback on it in the same way that I did with the Smoothing Pool design several months ago as that dialog proved to be very helpful.

Background Context

Ethereum Event Logs

To understand the problem and the solution, I’ll first briefly cover the EVM, events, and event logs provided by Execution clients.

When a Smart contract wants to access data, it can only access the data that is stored on the Blockchain itself at the time of the transaction triggering the contract’s behavior.

For example, the Rocket Pool contract for creating a new minipool can check if you have enough RPL staked to create that new minipool because that information is recorded on-chain when you stake RPL.

What a contract can’t do is access historical states.

It can’t look back into the past - it can only see the present.

Storing things on-chain is generally expensive, and it isn’t done unless necessary for the functionality of the contracts.

Things like your cumulative RPL earned aren’t stored on-chain for this very reason.

Thus, there’s no direct call you can make to the contracts to ask “how much RPL have I earned since I started running the node?”.

Using an Archive Execution Client to regenerate past states and ask them is prohibitively expensive for such an operation:

Archive ECs are generally not viable for home stakers to run and maintain because of their massive storage requirements and sync times
Accessing old states can take a long time if they’re stored on large, but slow, spindle HDDs instead of fast SSDs

This is where events and logs come in.

Smart contracts have a the ability to emit special messages called events.

Events contain well-defined, well-structured data that get stored (along with some metadata) by the Execution client.

They can be accessed at any time off-chain by querying the Execution client’s RPC route, but they are not accessible via the EVM (and thus, not accessible by smart contracts).

Events are used by smart contract developers for logging and debugging, and also to provide a way for light clients and third-party applications to query information about what happened on chain.

For example, the pre-Redstone RocketRewardsPool contract emitted an event called RPLTokensClaimed every time a Node Operator claimed monthly RPL rewards.

When emitted, this event logged:

The address of the node claiming the rewards
How much RPL was claimed
The time of the claim

You can actually see all of these events on Etherscan if you’re curious.

The Smartnode uses these events for several important tasks.

In addition to scanning them to calculate your cumulative total earned RPL rewards, the Oracle DAO also uses them to crawl validator deposits to the Beacon Deposit contract when checking for the withdrawal credentials exploit (the “scrub check”).

Events are effectively ways to record data about things that happen on the blockchain without needing to store that data on the chain itself, as long as the users of that data are off-chain and can query the Execution client’s RPC endpoint.

This makes them a cheap (emitting an event costs a trivial amount of gas compared to storing all of its data on-chain), reliable, and easily accessible way to extract data from the chain.

That being said, looking up and filtering through events can be computationally expensive.

Execution clients all use a data structure known as a Bloom filter to provide quick access to event logs.

This is actually part of the Ethereum standard itself; each block has a Bloom field for its logs specifically to make them efficient to filter.

While it’s generally quick, it has its limitations.

These limitations were hit with the new Redstone rewards system.

The Redstone Rewards System

With the new Redstone rewards system, at each rewards interval, the Oracle DAO generates an artifact known as a Merkle Tree.

Without going into too much detail, this file essentially snapshots and records the amounts of RPL earned from collateral rewards and the ETH earned from the Smoothing Pool by each node operator for that interval.

This data is stored off-chain, so the contracts themselves don’t actually know how much RPL or ETH you earned for a given interval.

You have to tell them how much you earned when you claim your rewards.

Luckily, Merkle Trees work in a clever way that makes it very easy and efficient for contracts to verify the amount you are trying to claim is correct, even though it doesn’t know how much you can claim.

With that context out of the way, the new claim system needs to know the following things in order to claim rewards for an interval:

The amount of RPL being claimed
The amount of ETH being claimed
The “Merkle proof”, which is a series of hashes that combine with the above to verify the amount being claimed is correct

These three values are stored in a JSON file the Oracle DAO uploads to IPFS for data resiliency, and which we mirror on Github in human-readable form for ease of access.

Feel free to take a look if you’re curious about these artifacts.

The Issue

Hopefully that background context makes it clear that in order to know how many rewards a user has earned for a given interval, they cannot ask the contracts directly as they could with the previous rewards system; they need to have the JSON file produced by the Oracle DAO for that interval.

Doing this means they need to know where the file is hosted; since the Oracle DAO hosts these files on IPFS, and files on IPFS are addressable by their CID (their hash), that means each node needs to know the hash of the file in order to retrieve it.

When the Oracle DAO reaches consensus on a Merkle Tree (which they all generate independently), the last member to vote on the that tree triggers it to be canonized as the official tree for that interval.

When it does this, it doesn’t store the CID on-chain.

Instead, to save on gas, it emits an event with the CID for the JSON file on IPFS.

For Smartnode operators, that means the node needs to look for this event when it notices a new rewards interval has begun.

At a high level, v1.6.1 of the Smartnode was designed like this:

Check the index of the current rewards period (0 for the first one, 1 for the second, etc.) which is on-chain
Check which intervals you’ve claimed rewards for already, which is on-chain
If you haven’t claimed for any intervals prior to the current one, make sure you have the rewards files for them
If you don’t have them (and you’re in “Download” mode), get the event emitted when the Oracle DAO submitted the interval which contains the URL of the rewards tree file on IPFS (which is off-chain)

The last step is the cause of the high CPU issue.

The Smartnode needs to look through the event logs of the new RocketRewardsPool contract to find the event the Oracle DAO emitted when it canonized the tree for that interval, as that event contains the CID needed to download the correct tree from IPFS.

Unfortunately, the Smartnode doesn’t know when to start looking for the new tree (as the block Redstone was deployed on is not recorded on-chain), so it defaults to a “safe” well-known value: the block that the Rocket Pool protocol itself was deployed to the chain, which is recorded on-chain.

For reference, on Mainnet, this was block 13,325,229.

It has almost been one year since then, and as of this writing, Mainnet is currently on block 15,523,175.

That means scanning for the first rewards interval even needs to go through over 2 million blocks to find it.

As clever and efficient as the Bloom filter is, this sheer amount of work - combined with the event log searching the Smartnode was already doing to calculate and display your cumulative RPL rewards earned on the Grafana dashboard - was too much for most Execution clients.

This information was being queried every 5 minutes (the default update interval for Grafana), and because it took longer than 5 minutes to calculate on most systems, the Execution client would suddenly be tasked with both the first round of the calculation and a new second round of that same calculation because the first one wasn’t done yet.

This caused a cascade of event log queries that brought the Execution client to its knees until the metrics gathering loop was stopped; hence why it was fixed by shutting down rocketpool_node, which is the process that runs the metrics gathering loop.

Unfortunately, this process is responsible for other key things, so this was only a temporary alleviation until Smartnode v1.6.2 was released, which contained a workaround for this problem.

v1.6.2 and the Short Term Workaround

Smartnode v1.6.2 included the following changes as a short-term mitigation to this issue:

Disabled calculation of legacy RPL rewards during Grafana’s metrics loop and the rocketpool node rewards command
Modified the way the Smartnode looks for Redstone rewards events (see below)

The Smartnode will (temporarily) hard code the block numbers where rewards events were emitted once the Oracle DAO has canonized the tree for an interval.

This way, it won’t have to search for these events; it already knows exactly where they are.

For new rewards intervals where the block isn’t hard-coded, it simply targets the block one rewards interval ahead of the last known hard-coded interval and searches a window of 20,000 blocks centered around this point.

If it can’t find the event there (because, for example, someone hasn’t updated the Smartnode in several months so there are several “unknown” rewards intervals), it will jump ahead another rewards interval and try again.

It will keep doing this until it reaches the head of the chain, at which point it will return an error.

This is a quick-and-dirty, but successful, way of finding the latest event with one important caveat: it only works if the rewards interval stays the same.

As soon as the interval changes, multiple past intervals can no longer be reliably retrieved without hard coding the block, so past Smartnodes aren’t guaranteed to work if the user doesn’t have the latest files already downloaded.

Downloading rewards in this case will require a Smartnode update (which has the specific blocks for each previous event hard-coded).

Long-Term Solution

The most reliable thing to do, bar none, is to store a map of interval -> rewards file CID as an array directly on-chain.

Kane and I have already explored this idea, and we believe it should be added into the Atlas update (predicated on DAO vote approval).

Once the data is on-chain, this entire problem with event scanning goes away.

This is a long-term solution though. Until then, we should investigate more robust fixes that can reliably weather a rewards interval change.

Medium-Term Options

Option 1: Do Nothing

The first option is to simply keep the system as-is until Atlas is released.

While there is no date for Atlas’s Mainnet release (and indeed, it is still very much in development), one could argue that there will only be a handful of rewards intervals between now and then and it simply isn’t worth spending development time providing a more robust fix until then.

It will require users to regularly update their Smartnode in order to capture any hard-coded rewards intervals, but one could also argue that node operators should be doing this anyway.

The main downside to this is that legacy cumulative RPL rewards will remain disabled until Atlas.

Option 2: Candidate Design for a Semi-Stateful Smartnode

The Smartnode has thus far been designed to be as stateless as possible.

It doesn’t record any information to the filesystem about the state of your node, its validators, or its activity; it procures all of this from the Execution client on-demand.

This way it always knows it has the correct data.

This was true before I was hired by the team, when Jake was still in charge of its architecture, and I’ve tried to stick to that paradigm as best as I can.

This might be a rare situation where we can break that rule, and record some data (particularly about cosmetic things that don’t affect actual node operation) off-chain on the node’s local filesystem.

The idea is that it can essentially “cache” a few things by calculating them once and then saving them so that it doesn’t have to look them up via regular on-chain scanning.

Importantly, those events will always be there in case the user needs to reconstruct or verify the cached data.

One candidate design for such a system would look something like this:

Construct a node-state file using YAML or JSON which will store cached data for the node.
Add a current-cached-block parameter to this file. This will store the latest block for which the node has processed, and cached, relevant data. Start this at the Rocket Pool deploy block (13,325,229 on Mainnet).
Add a legacy-rpl-rewards parameter to this file. This will store the cumulative RPL rewards earned pre-Redstone, for display purposes.
Add the Redstone deployment block as a hard-coded parameter to the Smartnode.
Upon rocketpool_node startup:
If current-cached-block is below the Redstone deployment block:
Crawl the event logs for the old pre-Redstone RocketRewardsPool contract as a background process.
Start at current-cached-block.
Sum all of the RPL claimed events to determine the cumulative pre-Redstone RPL rewards.
Update current-cached-block with the block number of each event, so it can resume if it gets interrupted later.
During the routine 5-minute update loop of rocketpool_node:
Check if current-cached-block is greater than or equal to the Redstone deployment block. If not, ignore the following behavior.
Check for unclaimed intervals. If any exist, and we do not have the rewards files for them:
Crawl the event logs of the new (post-Redstone) RocketRewardsPool contract as a background process.
Start at current-cached-block.
Look for the next rewards submission event (the first one that has not been downloaded yet).
When found, update current-cached-block to that block number.
Use the CID in the event to download the file. If it fails, let the logic run during the next cycle - it will try again immediately since current-cached-block already contains the block number for the missing interval.
Continue until all rewards interval files have been downloaded.

In theory, other things could be added to this state / cache file as well if the community has suggestions for things that would reduce the metrics-querying load on the EC and BN while maintaining data resilience.

Option 3: Current Behavior + Forward Tree Crawling

The third option is fairly easy in terms of implementation and CPU load.

It is effectively what we have now, but instead of jumping ahead and aiming at specific “windows” to search for event logs of each missing rewards interval, it just traverses the logs starting at the last known hard-coded block number and continuing until the event is found (or the head of the chain is reached).

This would cause some initial CPU load while it performed the initial traversal, but it would end once it found the relevant event and downloaded the missing file.

If the download fails, it would repeat this work (since it is stateless and doesn’t store the block at which it previously found the missing interval’s event), but this would only be a problem if it constantly fails to download the file which is indicative of other problems anyway.

Note that this wouldn’t provide a way to retrieve legacy RPL rewards.

Option 4: Something Else Entirely?

If you have an idea for how to solve this problem beyond the solutions above, feel free to include it in the comments here and we can all riff on it together.

Discussion

Hopefully I’ve provided enough context here for you to understand the problem, the short-term fix, and the options for a longer-term fix until we can resolve it directly in the contracts.

Thanks for taking the time to read through this, and I look forward to hearing feedback from everybody!

Patches · 13 September 2022 06:00

Option 2 seems good to me, though it would be nice if we could include cached data snapshots in the ipfs uploads from the odao, given that it isn’t node specific, which will alleviate load on new nodes.

Also if you’re going to be storing structured data on disk i highly recommend protobuf instead of json or yaml. Smartnode can trivially convert it to human readable formats and you get a lot of nice tooling/guarantees when it comes to backwards compatibility

Edit: might also be nice to make the entire feature optional given that future NOs will only have rewards in a post-cid-on-chain world

Wander · 13 September 2022 06:11

Incredible detail! Thank you for the context, Joe.

Personally, I’d suggest Option 1: Do Nothing since this isn’t a core feature, we expect Atlas to fix it, your time is very valuable, and this seems like the least complicated option.

That said, if Option 2 is likely to happen eventually anyway, we might as well get it out of the way.

Option 3 seems like it could lead to high loads if there’s a bug with the download process, so I’d avoid it.

Good idea to ask the community rather than just picking something. Maybe someone will have a stroke of genius.

Valdorff · 13 September 2022 16:05

So… given this is only for display purposes AND only for a limited time (until Atlas, presumably), I think the priority should be very low. Option 1 seems quite reasonable.

IF you want to put effort into this, I think I’d suggest the absolute minimum to get the job done. It sounds like a single crawl once ever to store the legacy total in a local file would get the job done “well enough”. This is effectively a combination of Option 1 and 2, with state limited to only the display issue that you called out as being the main annoyance of Option 1.

Darkmessage · 13 September 2022 16:06

First of all, thank you so much for this post. Most of the incentives and also some governance stuff goes over my head, so I really appreciate the more technical stuff.

Second, I would like to entertain another solution. Please enlighten me, why this would not work/be a bad idea:

Current state:

Proposed solution:

The DAO figures out at which block the last event emission was included. It then emits this block number alongside the new CID.

So, something like this:

CID: 0xA47F38BC...
PREVIOUS_EVENT_BLOCK: 104738

With this, the smartnode can begin at the head of the chain and work back each block until it finds the first/latest event. After this, we can skip back to the precise block mentioned in the event and then even precisely skip back more since the event will again contain the previous event block number.
So this is basically a linked list of events.

The first new event would be special and have a list of all previous event blocks in the correct order (since they already happened and don’t contain a previous event block) . For example:

CID: 0xA47F38BC...
PREVIOUS_EVENT_BLOCKS: 
    11111
    22222

With 11111 being the event block before this current event and 22222 being the event block before event block 11111.

This way, we have all blocks which contain events from the rewards tree but we don’t have to emit a list which constantly gets larger.

Valdorff · 13 September 2022 16:13

Changing the event emitted would require contract changes, so that wouldn’t address the short-term (pre-Atlas) need. Sounds like Joe/Kane had something in mind for Atlas and beyond.

Darkmessage · 13 September 2022 16:16

Yes, but the current long term solution is to store it on the chain. Since they said events were much cheaper, this would be an alternative.

ken · 13 September 2022 17:56

Joe - You are a natural at explaining even the most complex subjects in a straightforward and clear explanation. The write-up and analysis were excellent!

I support the long-term solution you and Kane outlined as part of Atlas. For the interim, I strongly recommend Option 1 - Do nothing. There is are ready a band-aid fix in place, and even it if fails, users can manually connect the Merkle file. And then after that, the permanent fix will eventually solve it elegantly.

I agree 100% that there are much more essential development tasks for the team to be focusing their efforts on ATM (cough cough, LEB8s, cough cough)

a35u · 14 September 2022 00:15

Option 1 is good. Don’t waste time on stopgap solutions unless they are useful for something else.