Project: RocketMEVMonitor
What is the work being proposed?
This core of this project is a collection of scripts and a database to collect, collate, and archive data from all known, public and active MEV relays in the Ethereum ecosystem (both authorized and unauthorized by RocketPool for use by node operators). This data would then be republished and made freely available and accessible for researchers investigating and monitoring for MEV/Priority fee theft.
The data to be collected and republished includes:
- All ‘delivered payloads’ (the blocks that the relay reports as having the winning bid selected by the validator) reported by all known public relays. The list of relays specified by EthStaker will be the starting point.
- All ‘top bids’ from block builders, for every slot, for every known, public relay. This would include an entry if the relay reports no bids were received for a slot.
- ‘All bids’ a relay received for a slot. Relays can receive hundreds of bids, for every slot. But since this is very large amount of data, with little obvious long-term utility, it would only be stored with a limited retention period of 60-90 days depending on storage space impact.
- A convenience table with basic data on every block committed and finalized to Ethereum including the proposer’s validator index, the block header fee recipient, the priority fees paid in the block and metadata on the last transaction in the block.
- A recording of the registered validators assigned as proposers for every epoch, for every known relay. This would allow for monitoring of fee recipient changes or anomalies.
The data would be made available as compressed json or csv (or both) in a Cloudflare R2 bucket (or similar low cost web2 service). Occasional full sql database dumps are another possible format for publication.
Data would be collected and published in close to real-time, along with the finalization of every epoch.
The data would be provided with pgp signatures to allow verification of provenance. As a stretch goal, a merkle log in the same vein as a certificate transparency log could be developed and published to prevent tampering of past data.
The code would include a docker-compose script to enable easy deployment of the database and data collection and publication agents by third parties.
Is there any related work this builds off of?
Yes. The initial development was done in pursuit of GMC Bounty BA032304. This work completed much of the data collection and indexing side of this project.
Will the results of this project be entirely open source?
Yes. GPL3 or AGPL3 (TBD). It is explicitly expected and desired for other parties to operate instances of this collection and publication agent.
Benefits - enter N/A where appropriate
Background: MEV Relays are trusted entities in the Ethereum staking ecosystem. The data they publish in their data api’s allowing for verification of their work can, and in many cases, has disappeared without much warning, making future research into their past activities difficult. For example:
- Relayoor.wtf was active shortly after the merge and shut down at some point. All their data is gone.
- The BloxRoute relays seem to prune their bid data after about 1M blocks.
- The BloxRoute Ethical relay shut down, and that data is no longer available from them (although I collected much of this data from before the shutdown).
- There seems to be multiple instances of blocks that seem to have been built by a third party block builder, and likely published by a relay, but that relay can no longer be identified by any known, active relay’s
proposer_payload_delivered
endpoint. - Flashbots had been providing archival access to their data in an S3 bucket, but they have not updated their data for several months.
- All relays seem to aggressively rate-limit requests to their data api’s, drastically slowing down research into historical bids and payloads.
By having a third party collect, archive and republish this data in bulk, and making the tools available for anyone else to do the same, the trust assumptions we make of the relays can be constrained.
How does this help people looking to stake ETH for rETH?
By collecting and making MEV relay data available and accessible, MEV theft might be more easily discovered and penalized. This would give prospective rETH holders more confidence that RocketPool can more effectively prevent and penalize MEV theft.
How does this help rETH holders?
For the same reasons as above. rETH holders would have confidence that their capital is earning a fair return because the data to enable effective penalization of MEV theft is available.
How does this help people looking to run a Rocket Pool node for the first time?
MEV relays are “Trusted” entities. As such, they have the potential to engage in malicious conduct that could improperly harm and penalize operators. By collecting and archiving their data, a substantial constraint on that trust could be achieved because there could be relatively easy access to their data via third parties.
How does this help people already running a Rocket Pool node?
For the same reasons as above. Archiving relay data allows for constraining the trust assumptions placed on the relays.
How does this help the Rocket Pool community?
In the event of an MEV theft penalization dispute, good historical data could provide evidence to help properly adjudicate the matter fairly and possibly prevent a painful split in the community.
How does this help RPL holders?
By helping rETH holders and node operators as described above, RocketPool as whole is improved. What is good for RocketPool - is good for RPL.
What other non-RPL protocols, DAOs, projects, or individuals, would stand to benefit from this grant?
- All other staking services and protocols that are vulnerable to operator induced MEV theft would potentially benefit by having better access and availability of this data.
- Other MEV researchers who could benefit from easy access to this data.
Will the resulting project be open source?
Yes. GPLv3 or AGPL (TBD)
Team
Who is doing the work?
Me - NonFungibleYokem (aka yokem55, yokem)
What is the background of the person(s) doing the work? What experience do they have with such projects in the past?
I am a long term RocketPool community member and node operator.
My real-world job has me doing a lot of networking and sysadmin work, along with a substantial amount of postgres DBA work. Most of my code skills are in writing in ruby. As such, the vast majority of the project would be written in ruby, utilizing the sequel orm to interact with the database, and miscellaneous libraries for interaction with Ethereum and the beacon api, the relay apis, generating the static bucket index, and for signing and publishing the data to the cloud storage bucket.
What is the breakdown of the proposed work, in terms of milestones and/or deadlines?
Milestone 1 - Code completion (~14-21 days):
- Addition of a docker-compose script and docker file to automate deployment by third parties.
- Code to collect and index validator relay registrations for every upcoming epoch.
- Code to export the data into a suitable format (json, csv, raw sql dumps at regular intervals).
- Code for cryptographic signing of the exported data.
- Code to automate synchronization of the data with the cloud storage service.
- Code to update and republish a friendly index that allows for recursive downloading of the data.
- Public publication of code on Github.
Milestone 2 - Deployment and move to production (~3 days)
- Setup and deployment of a dedicated box to host and operate the postgres db and collection/publication agents.
- Importing of the pre-collected data into the production DB.
- Initial seeding and publication of previous historical data.
- Verification of the estimated long term cloud storage costs.
- Announcement of public availability on discord, twitter, etc.
Milestone 3 - 6 months of operations (6 months)
- Regular operations for 6 months collecting and publication of the data (to start).
Possible Stretch Goals (~3 months):
- Development of a graphql query interface to the database for arbitrary queries.
- Development of an merkle logging system that could potentially flag tampering of data.
- Make the Graphana monitoring dashboard a publicly accessible resource.
How is the work being tested? Is testing included in the schedule?
- There will be a script to perform regular, randomized spot checks by comparing archived data to live data from known relay api endpoints. A high match rate (99+%) should indicate that the data is comprehensive.
- Liveliness monitoring and alerting with a private graphana dashboard.
How will the work be maintained after delivery?
There will be ongoing maintenance of the code to add and remove known relays and make adjustments to accommodate changes to the relay data api’s and possibly any other changes caused by Ethereum hard-forks.
Payment and Verification
What is the acceptance criteria?
- Publication of a test script that can be used by anyone to test availability of the data at will.
- Completion of milestone 2 demonstrating the availability of the code and data.
- GMC performed spot checks of the availability and liveliness of the data.
- Completion of stretch goals and publication of the additional needed code.
What is the proposed payment schedule for the grant? How much USD $ and over what period of time is the applicant requesting?
- $5000 after completion of Milestone 2.
- $200 per cycle to cover ongoing cloud storage hosting and maintenance costs for 6 cycles. ($1200 total)
- $2500 for completion of the stretch goal milestone.
- If the community feedback indicates that the project/product useful, I’ll reapply for additional months to support the ongoing hosting costs.
How will the GMC verify that the work delivered matches the proposed cadence?
- Skilled members of the GMC could access the data, review the published code, and utilize the test script to spot check availability.
- Other community members could operate their own instance of the project and compare the results.
What alternatives or options have been considered in order to save costs for the proposed project?
- I intend to use a low cost cloud storage vendor. Cloudflare R2 likely presents the best balance between being a reputable vendor at an affordable cost.
- I’ll be using the colocation space I already have to host the deployed box hosting the database and collection/publication agents.
- I could investigate the sutability of more crypto-oriented web3 storage solutions, but that feels quite a bit more immature relative to the goals of this project at this time.
Conflict of Interest
Does the person or persons proposing the grant have any conflicts of interest to disclose? (Please disclose here if you are a member of the GMC or if any member of the GMC would benefit directly financially from the grant).
No. I am a member of the IMC, but I do not believe that this would pose any conflict with that work.
Will the recipient of the grant, or any protocol or project in which the recipient has a vested interest (other than Rocket Pool), benefit financially if the grant is successful?
No. Well … yes. Ethereum benefits. MEV relays are some of the most centralized bits of Ethereum right now. Their data is largely stuck behind their slow, and rate-limited web2 apis. Making this data more easily accessible benefits Ethereum. And as an Eth bagholder, that is to my benefit.