In this repo you will find found how Antipode fixes the cross-service inconsistency described by Facebook, and simplified in the form of a microbenchmark that runs on top of AWS Lambda, depicted in the following picture.
In this application, users can upload posts and followers receive notifications. Internally, the application comprises two key service namely:
- a Writer service (comprised of
post-uploadandpost-storageservices) that works as a proxy for the clients and is responsible for storing and processing the contents of posts - a Reader service (comprised of
notifierandfollower-notifyservices) in charge of disseminating notification events which notifies followers of new posts.
In our implementation, we each service corresponds to a Lambda functions, which access off-the-shelf datastores. Each external client request spawns a Writer call, which writes the new post to post-storage, and then creates a new notification in the notifier.
Meanwhile, a new Reader is spawned when a new notifier replication event is received.
For the off-the-shelf datastores we used combinations MySQL, DynamoDB, S3, and Redis for storing posts; and SNS, AMQ, and DynamoDB for notification events.
Cross-service inconsistencies can occur in this application: followers in Region B can be notified of posts that do not yet exist in that region, in other words, if when reading a post we receive a object not found error, then an inconsistency occured.
Antipode solves this violation by placing a barrier right after the Reader receives the notification replication event.
- Docker
- Python 3.8
- Install requiremnts
pip3 install -r requirements.txt - Configure AWS according to instruction below.
Start by configuring your AWS credentials:
- Install AWS cli tools
aws(version 2) - Configure your local authentication profile
aws configure - Copy generated credentials to the application path
cp ~/.aws/credentials .
We assume the following regions:
- For EU we use Europe (Frankfurt) datacenter and the
eu-central-1aavailability zone - For US we use US East (N. Virginia) datacenter and the
us-east-1aavailability zone - For AP we use Asia Pacific (Singapore) datacenter and the
ap-southeast-1aavailability zone
For each resource configuration, do not forget to set up the correct endpoints in the corresponding sections (lambda and datastores) in the connection_info.yaml file.
WARNING: In AWS, go to Service Quotas, AWS Lambda and make sure the apply quota value of concurrent executions is set to 1000 in all regions listed in the following sections.
- Create a role named
antipode-cloudformation-admin(name is defined at the end):- Trusted Entity Type:
AWS Service - Use Case: search and select
CloudFormation - Next, add the following permission policy:
Administrator Access
- Trusted Entity Type:
- Create a role named
antipode-lambda-admin(name is defined at the end):- Trusted Entity Type:
AWS Service - Use Case:
Lambda - Next, add the following permission policies:
Administrator AccessAmazonDynamoDBFullAccessAmazonEC2FullAccessAmazonElastiCacheFullAccessAmazonSNSFullAccessAmazonSQSFullAccessAmazonVPCFullAccessAmazonMQFullAccessAWSLambda_FullAccess
- Trusted Entity Type:
- Create a role named
antipode-s3-admin(name is defined at the end):- Trusted Entity Type:
AWS Service - Use Case: search and select
S3 - Next, add the following permission policies:
AmazonS3FullAccess
- Trusted Entity Type:
- Add the endpoints for the first two roles at the begging of
connection_info.yaml.
- Go to each reader region (
us-east-1,ap-southeast-1) zone and to the AWS SQS dashboard - Create queue with the following parameters:
- Standard type
- Name:
antipode-lambda-eval
As a tip use the same name for all objects, its easier to track. We use antipode-mq
- Create a VPC with a unique IPv4 CIDR block, distinct from the ones used in other regions, as exemplified in the connections info file:
- eu:
50.0.0.0/16 - us:
51.0.0.0/16 - ap:
52.0.0.0/16 - MAIN CONCERN: Amazon MQ peering connection WILL NOT WORK ON OVERLAPPING CIDR BLOCKS ACROSS REGIONS. Hence choose a unique one for each region VPC
- eu:
- After creating select the create vpc, click on
ACTIONS, go toEdit VPC settingsand enable DNS hostnames - Create two subnets, one for each Availability Zone (
aandb). For example:- eu:
50.0.0.0/20,50.0.16.0/20 - us:
51.0.0.0/20,51.0.16.0/20 - ap:
52.0.0.0/20,52.0.16.0/20 - MAIN CONCERN: Amazon ElastiCache (redis) requires an additional subnet with different AZ for the additional replica.
- IMPORTANT REMINDER: the subnet ids used in connections info file correspond to the first one for each zone
- eu:
- Go to Security Groups and select the default one for the created vpc.
- Inbound rules: Add 2 rules for
ALL TRAFFICto Any IPv4 (0.0.0.0/0) and IPv6. Make sure you have a rule for the same SG ID - Outbout rules: Add 2 rules for
ALL TRAFFICto Any IPv4 (0.0.0.0/0) and IPv6. Make sure you have a rule for the same SG ID
- Inbound rules: Add 2 rules for
- Create an Internet Gateway
- After creating go to
Actionsand attach it to the VPC
- After creating go to
- Go to Route Tables and select the one created (check the matching vpc id)
- Go to Edit Routes and add an entry for
0.0.0.0/0with target to the created internet gateway - select Internet Gateway and the id will appear
- Go to Edit Routes and add an entry for
- Go to Endpoints and create an entrypoint for AWS Services needed. Make sure you select the correct VPC, Subnet for the
aAvailability Zone and Security Group:- Reader (
eu-central-1): SQS - Writer (
us-east-1,ap-southeast-1): SNS, Dynamo (Gateway)
- Reader (
Before starting, in each of the zones where you will be deploying MySQL, go the AWS RDS dashboard:
- On the left side, click on
Parameter Groups. And create a new one (e.g.aurora-mysql5.7) - Although you can let the default parameters stay, you might want to increase max_connections
Now we setup the cluster per se (reference):
- Go to the writer zone
eu-central-1 - Go to AWS RDS dashboard and click on
Create Database - Select
Standard Create- Engine type: Amazon Aurora with MySQL compatibility
- On the filters enable the
Show versions that support the global database featurefilter - Select a MySQL version that supports
Global Database(info on the right side panel), e.g. Aurora MySQL 5.7 v2.11.2 - Select
Productiontemplate - DB cluster identifier:
antipode-lambda-eu - For credentials you can use the following:
- Master Username:
antipode - Master Password:
antipode
- Master Username:
- Select a memory optimized machine (e.g.
db.r6.large). You can tickInclude previous generationsfor older and cheaper instances. - Do not create Multi-AZ deployment
- Choose the
Default VPC. Warning: do not try to change theantipode-mqVPC to support RDS by adding more subnets -- use a different one. - Enable
Public access - Choose the existing
allow-allVPC security group. If its not created, you should create with:- ALL TRAFFIC open for all IPv4 and IPv6, in inbound and outbound
- Rule to allow itself - the security group - in inbound and outbound
- Select the AZ terminated in
a - On
Additional configurationmake sure the database port is3306 - On
Monitoring- Disable
Performance Insights - Disable
Enhanced Monitoringon the additional configurations
- Disable
- On
Additional configuration:- Leave the
Initial database nameblank as we will create later - Set the
DB cluster parameter groupand theDB parameter groupwith the previously created parameter group - Disable Encryption
- Disable
Auto minor version upgrade
- Leave the
- Wait for the database instances to be created
- In
Databases, select the top level entry namedantipode-lambda-euwith typeRegional cluster. Click on Actions andAdd AWS region. You will get to aAdd Regionpanel where you can setup the new replica:- Global database identifier:
antipode-lambda - Select secondary region, e.g.
US East (N. Virginia)which would mean that - Select the same model of machine selected in the writer zone (e.g.
db.r6.large) - Do not create Multi-AZ deployment
- Choose the
Default VPC. - Enable
Public access - Select the
allow-allVPC security group. If its not created, you should create with:- ALL TRAFFIC open for all IPv4 and IPv6, in inbound and outbound
- Rule to allow itself - the security group - in inbound and outbound
- Select the AZ terminated in
a - On
Additional configurationmake sure the database port is3306 - Keep
Turn on global write forwardingdisabled - On
Additional configuration:- DB instance identifier, e.g.
antipode-lambda-us-instance - DB cluster identifier:
antipode-lambda-us - Disable
Performance Insights - Disable
Enhanced Monitoring - Disable
Auto minor version upgrade
- DB instance identifier, e.g.
- Global database identifier:
- When everything is created run
./antipode_lambda cleanthat will automatically create MySQL tables - Go to the
connection_info.yamlfile and fill out the cluster endpoints for each zone. To get the endpoints go to the RDS dashboard and on theDatabaseslist select the corresponding instance. On the panel below, select theConnectivity & securitytab and copy the value underEndpoint.- RDS Writer instance corresponds to the
writerinstance endpoint - RDS Reader instance corresponds to the
readerinstance endpoint
- RDS Writer instance corresponds to the
- Finally run
./antipode_lambda clean -n mysqlso we create database and tables
- Go to
eu-central-1zone and to the AWS SNS dashboard - Go to Topics and create a new one with the following parameters:
- Standard type
- name:
antipode-lambda-notifications
- Go AWS dashboard and create buckets within all zones. Note that names are unique and you will probably need to use a different one
- Name:
antipode-lambda-posts-<region> - Enabled versioning
- Name:
- Go the bucket in the primary region. Go to
Managementtab and create replication from the bucket in that region other region's buckets- Name:
to-reader-<secondary region> - Rule scope: apply to all objects
- On Destination click
Browse S3and find the bucket named:antipode-lambda-posts-<secondary region> - Use the
antipode-lambda-s3-adminIAM role. This is a rule that gives S3 admin access to operations needed - Do not select RTC
- When created do not choose
Replicate existing objects
- Name:
- On each region create the following tables:
posts: withkas partition keynotifications: withkas partition key
- Update the
connection_info.yamlfile with the new table names - For remaining settings select everything default. In table settings, select customize settings and change
Read/Writecapacity settings toOn-demand - After created go to DynameDB dashboard on the primary region and select
Tableson the left-hand panel. For each table do the following: - Go toGlobal Tables- Create replica to the desired region - Double check in secondary region if tables got created - For the
notificationstables in the secondary regions, go toExportandStreamsand obtain the stream ARN to be configured in theconnection_info.yamlfile
-
Go to AWS ElastiCache dashboard and on the left-side panel select
Global Datastores. -
Click on
Create global cluster. Start with the primary (writer) zone. If you are adding a zone to an existing cluster just go to the dashboard and click onAdd zone. The properties are similar for the other zones you add to the cluster. Configure each zone in theantipode-lambdacluster:- Keep
Cluster modedisabled - For the
Global Datastore infouseantipode-lambda - Create a regional cluster with the region's name, e.g.
antipode-lambda-eu - Set
Engine versionto6.2(a different version should not impact results) - Set
Portto6379(or the one you define inconnection_info.yaml) - Set node type to e.g.
cache.r5.large - Set
Number of replicasto 1 - Create a new Subnet group:
- Name:
antipode-mq-ec - Select previously created VPC (
antipode-mq).
- Name:
- Confirm that the
Availability Zone placementsare the same as the AZ from the subnet group. Make the primary on theaAZ. - Disable
Auto upgrade minor versions - Disable backups
- Select the default Security Group for the chosen VPC
- If following the AWS form you should create the secondary (reader) zone next with similar configurations as before but in a new zone.
- Keep
- Finally, set the the endpoints in the
connection_info.yamlfile. Go back to the AWS ElastiCache dashboard, click onRedis clusterson the left-hand panel, click on the regional cluster name, e.g.antipode-lambda-euand the copy thePrimary endpointwithout the port. On the seconda
WARNING: We found that for some new accounts using the AWS ElastiCache, you might have to create an EC2 instance on the same zone of your cluster and then perform an initial request to kinda "unlock" the zone for ElastiCache.
-
Using the previously created VCP, you have to add peering between the reader and writer zone
-
Check the following material for more details:
- https://docs.aws.amazon.com/vpc/latest/peering/create-vpc-peering-connection.html
- https://docs.aws.amazon.com/vpc/latest/peering/vpc-peering-routing.html
HUGE WARNING: WILL NOT WORK WITH VPCs WITH OVERLAPING CIDRS
-
Go to the the secondary zone and create a new Peering Connection
-
Name:
antipode-mq-<primary>-<secondary>(e.g. antipode-mq-eu-us)- Select the previously created VPC
- The select the primary zone and paste the previously created VPC id
-
Go to the Peering Connections on the primary zone and accept the pending request (you might want to change the name as well)
-
On both zones go to the Routing Table. We will match the pair the CIDR blocks
- On zone
REGION-Aadd the entry:<REGION-B CIDR block> -> pcx-id (peering connection) - On zone
REGION-Badd the entry:<REGION-A CIDR block> -> pcx-id (peering connection)At the end of the whole setup, primary should have a configuration similar to this one:
50.0.0.0/16 local (self) 51.0.0.0/16 pcx-id (peering connection to secondary, e.g. eu-us) 52.0.0.0/16 pcx-id (peering connection to secondary, e.g. eu-sg) 0.0.0.0/0 igw-id (internet gateway) (you might have more entries from the endpoint configurations)And the secondaries should have a configuration similar to this one:
52.0.0.0/16 local (self) 50.0.0.0/16 pcx-id (peering connection to primary, e.g. eu-sg) 0.0.0.0/0 igw-id (internet gateway) (you might have more entries from the endpoint configurations) - On zone
-
-
-
Go the all the zones and create a broker with the following configuration:
- Engine: Apache ActiveMQ
- Single-instance broker
- Durability optimized
- Broker name:
antipode-lambda-notifications-<region> - Username:
antipode - Password:
antipode1antipode - Broker engine: 5.16.2
- Select to create a default configuration
- Select pre-created VPC config:
antipode-mq - Select pre-created Security group:
antipode-mq - Disable maintenance
-
Double check that you CAN access the public broker management dashboard
-
Go the the PRIMARY (writer) zone and edit the created configuration by uncommenting the networkConnectors blocks and replace with this (change the uris as needed): ref: https://docs.aws.amazon.com/amazon-mq/latest/developer-guide/amazon-mq-creating-configuring-network-of-brokers.html
<networkConnectors> <networkConnector duplex="true" name="ConnectorEuToUs" uri="static:(ssl://b-6cfdfde0-2f84-4723-94bd-cc9ada66c2a9-1.mq.us-east-1.amazonaws.com:61617)" userName="antipode"/> <networkConnector duplex="true" name="ConnectorEuToSg" uri="static:(ssl://b-6cfdfde0-2f84-4723-94bd-cc9ada66c2a9-1.mq.us-east-1.amazonaws.com:61617)" userName="antipode"/> </networkConnectors>
-
Go the broker again and change the REVISION of the configuration file and do APPLY IMMEDIATLY
-
In your local machine test the queue by creating a consumer on a secondary region to the primary region (change url):
activemq consumer --brokerUrl "ssl://b-20f3cf89-7725-44b0-946b-19e84c03b81e-1.mq.us-east-1.amazonaws.com:61617" \ --user antipode \ --password antipode1antipode \ --destination queue://antipode-notifications- Double check with a producer
activemq producer --brokerUrl "ssl://b-8b026a92-1858-4a76-bc7a-7bfb25be209d-1.mq.eu-central-1.amazonaws.com:61617" \ --user antipode \ --password antipode1antipode \ --destination queue://antipode-notifications \ --persistent true \ --messageSize 1000 \ --messageCount 10- Go the the dashboard of the created broker in AWS (ActiveMQ Web Console -> Manage ActiveMQ Brocker -> Queues) and you should see 10 messages enqueued and dequeued
-
In your local machine create a secret for MQ lambda access on the primary region:
aws secretsmanager create-secret --region us-east-1 --name antipode-mq --secret-string '{"username": "antipode", "password": "antipode1antipode"}'- After created edit the secret and replicate to secondary regions if needed (ap-southeast)
For a pair of post-storage and notification-storage backends, for instance mysql and sns respectively, and for two regions as writer and reader, for instance EU and US respectively, do the following:
./antipode_lambda build --post-storage dynamo --notification-storage sns --writer eu --reader usTo enable Antipode add -ant parameter, and to introduce artificial delay before publishing the notification add --delay <time> parameter.
Then you run a the build with:
./antipode_lambda run -r 1000Which will trigger 1000 writer lambdas
After the run ends you can start gathering the results with:
./antipode_lambda gatherAt the end you need to clean your experiment with:
./antipode_lambda clean --strongThe strong flag will remove the deployed lambdas. If you remove the flag it will just clean storages so you can run again.
As an alternative method, you can run our maestrina script (all regions, all combinations) with ./maestrina.
Note that, if you find any errors you can always run a single combination using the antipode_lambda as described above.
At the end, you can build plots for consistency window and delay vs. inconsistency.
Copy the sample.yml in plots/configs, renamed it and configure according to your gather traces.
In your new config file, provide the gather paths in consistency_window for each post and notification storages directory.
Note that the antipode trace needs to be listed before the original, as exemplified in the sample file
Build the plot:
./plot plots/configs/sample.yml --plots consistency_windowIn your new config file, provide the gather paths in delay_vs_per_inconsistencies for each post and notification storages directory.
Change the combinations as needed and build the plot:
./plot plots/configs/sample.yml --plots delay_vs_per_inconsistenciesIn your new config file, provide the gather paths in storage_overhead. Change the combinations as needed and build the plot:
./plot plots/configs/sample.yml --plots storage_overheadJoão Loff, Daniel Porto, João Garcia, Jonathan Mace, Rodrigo Rodrigues
Antipode: Enforcing Cross-Service Causal Consistency in Distributed Applications
SOSP 2023.
Paper
Phillipe Ajoux, Nathan Bronson, Sanjeev Kumar, Wyatt Lloyd, Kaushik Veeraraghavan
Challenges to Adopting Stronger Consistency at Scale
HotOS 2015.
Paper
Presentation
