Amazon S3 (Simple Storage Service) is an object storage service that offers industry-leading scalability, data availability, security, and performance. Data Lake is one of pupular use cases of S3, which is a centralized repository that allows you to store all your structured and unstructured data at any scale. You can run data analytics, artificial intelligence (AI), machine learning (ML), and high-performance computing (HPC) applications to unlock the value of your data.
Follow the official guide to install and configure profiles.
After the installation is complete, you can check the aws cli version:
aws --version
aws-cli/2.5.8 Python/3.9.11 Darwin/21.4.0 exe/x86_64 prompt/off
Terraform is an open-source infrastructure as code software tool that enables you to safely and predictably create, change, and improve infrastructure.
This is the official guide for terraform binary installation. Please visit this Install Terraform website and follow the instructions.
Or, you can manually get a specific version of terraform binary from the websiate. Move to the Downloads page and look for the appropriate package for your system. Download the selected zip archive package. Unzip and install terraform by navigating to a directory included in your system's PATH.
Or, you can use tfenv utility. It is very useful and easy solution to install and switch the multiple versions of terraform-cli.
First, install tfenv using brew.
brew install tfenv
Then, you can use tfenv in your workspace like below.
tfenv install <version>
tfenv use <version>
Also this tool is helpful to upgrade terraform v0.12. It is a major release focused on configuration language improvements and thus includes some changes that you'll need to consider when upgrading. But the version 0.11 and 0.12 are very different. So if some codes are written in older version and others are in 0.12 it would be great for us to have nice tool to support quick switching of version.
tfenv list
tfenv install latest
tfenv use <version>
module "s3" {
source = "Young-ook/s3/aws"
name = "datalake"
tags = { env = "test" }
}Run terraform:
terraform init
terraform apply
To manage your objects so that they are stored cost effectively throughout their lifecycle, configure their Amazon S3 Lifecycle. An S3 Lifecycle configuration is a set of rules that define actions that Amazon S3 applies to a group of objects. There are two types of actions:
- Transition actions Define when objects transition to another storage class. For example, you might choose to transition objects to the S3 Standard-IA storage class 30 days after you created them, or archive objects to the S3 Glacier storage class one year after creating them. There are costs associated with the lifecycle transition requests. For pricing information, see Amazon S3 pricing
- Expiration actions Define when objects expire. Amazon S3 deletes expired objects on your behalf. The lifecycle expiration costs depend on when you choose to expire objects.
For more information, see Object lifecycle management.
module "s3" {
source = "Young-ook/s3/aws"
name = "datalake"
tags = { env = "test" }
lifecycle_rules = [{
enabled = "true"
transition = [{
days = "120"
storage_class = "STANDARD_IA"
}]
expiration = {
days = "160"
}
}]
}Modify the terraform configuration file to add a lifecycle rule to apply objects in the S3 bucket.
terraform init
terraform apply
S3 Intelligent-Tiering is a new Amazon S3 storage class designed for customers who want to optimize storage costs automatically when data access patterns change, without performance impact or operational overhead. S3 Intelligent-Tiering is the first cloud object storage class that delivers automatic cost savings by moving data between access tiers — frequent access, infrequent access, archive, deep archive — when access patterns change, and is ideal for data with unknown or changing access patterns.
S3 Intelligent-Tiering stores objects in many access tiers. For a small monthly monitoring and automation fee per object, S3 Intelligent-Tiering monitors access patterns and moves objects that have not been accessed for 30 consecutive days to the infrequent access tier. There are no retrieval fees in S3 Intelligent-Tiering. If an object in the infrequent access tier is accessed later, it is automatically moved back to the frequent access tier. No additional tiering fees apply when objects are moved between access tiers within the S3 Intelligent-Tiering storage class. S3 Intelligent-Tiering is designed for 99.9% availability and 99.999999999% durability, and offers the same low latency and high throughput performance of S3 Standard.
For more information, see Amazon S3 Intelligent-Tiering. Also, you can find out more about how to add archive tier to your Intelligent-Tiering stoage classes for lower stoage costs in this blog, S3 Intelligent-Tiering Adds Archive Access Tiers.
module "s3" {
source = "Young-ook/s3/aws"
name = "datalake"
tags = { env = "test" }
lifecycle_rules = [{
enabled = "true"
transition = [
{
"days" : "0",
"storage_class" : "INTELLIGENT_TIERING"
},
]
}]
intelligent_tiering_archive_rules = {
state = "Enabled"
filter = [{
prefix = "logs/"
tags = {
priority = "high"
class = "blue"
}
}]
tiering = [{
access_tier = "ARCHIVE_ACCESS"
days = 125
}, {
access_tier = "DEEP_ARCHIVE_ACCESS"
days = 180
}]
}
}Modify the terraform configuration file to add a lifecycle rule to apply objects in the S3 bucket.
terraform init
terraform apply
With Amazon S3 bucket policies, you can secure access to objects in your buckets, so that only users with the appropriate permissions can access them. You can even prevent authenticated users without the appropriate permissions from accessing your Amazon S3 resources. This example shows how to configure a bucket policy to allow access to buckets through VPC endpoints only for security and compliance.
For more information, refer to the Controlling access from VPC endpoints with bucket policies user guide.
module "s3" {
source = "Young-ook/s3/aws"
name = "datalake"
tags = { env = "test" }
bucket_policy = {
vpce-only = {
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "AllowAccessFromVpcEndpoint"
Action = [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
]
Effect = "Deny"
Principal = {
AWS = flatten([module.aws.caller.account_id, ])
}
Resource = [join("/", [module.s3.bucket.arn, "*"]), module.s3.bucket.arn, ]
Condition = {
StringNotEquals = {
"aws:sourceVpce" = module.vpc.vpce.s3.id
}
}
},
]
})
}
}
}Modify the terraform configuration file to apply a bucket policy for access control.
terraform init
terraform apply
- Data Mesh: A Monolithic Data Lake to a Distributed Data Mesh
- Data Mesh Architecture
- Amazon DataZone - Data Mesh and Modern Data Architecture on AWS
- Working with Mountpoint for Amazon S3
- Mountpoint for Amazon S3 configuration
- Mountpoint for Amazon S3 file system behavior
