All you need to know about S3 From Amazon

PUBLISHED ON - 9 minute read

S3(simple storage service) is one of the core services that AWS (Amazon Web Service) provides and probably one of the most famous. It is a cloud storage service that stores objects (data) with descriptive metadata inside globally uniquely named buckets which are similar to folders on your computer.

What are objects and buckets ?

Buckets are similar to your folders on you computer which contains your data, files, etc.., using buckets you can store unlimited amount of data in safe place and can be replicate it across multiple AWS availability zones

Objects present your data which is similar to what you store inside your folders on computer, photos, videos, backup of your computer, etc…

Each objects stored inside bucket assembled from :

  1. Key: which represent the name of the file
  2. Value: the data which is made of sequence of bytes
  3. Version ID

What are S3 features ?

  • all the data stored in S3 are in safe.
  • data stores are separated across multiple devices facilities
  • each file can be between 0B to 5TB
  • unlimited storage.
  • 99.99% availability for S3 Platform.
  • AWS will guarantee 99.99% availability, and 99.999999999% durability for S3 information which means if you put your object up it will be nearly impossible to lose it.
  • stored data can be encrypted
  • stored data can be versioned
  • Multi Factor Authentication deletion process.

What are S3 classes and their use cases?

S3 standard is the most known class across S3 classes, but what about the others?

S3 standard

  • 99.9% availability
  • 99.999999999% durability
  • stored redundantly across multiple devices and multiple facilities
  • designed to sustain the loss of 3 facilities concurrently

S3 IA (infrequently Access)

  • used with data access less frequently.
  • used with data required rapid access when needed.
  • lower fee than S3 Standard.
  • charged on retrieval fee.

S3 One Zone IA (infrequently Access)

  • lower cost option infrequently accessed data.
  • don’t require the multiple availability zone data resilience.

S3 - intelligent Tiering

  • designed to optimize costs by automatically moving data to the most cost-effective access tier without performance image or operational overhead.
  • uses AI.

S3 - Glacier

  • secure, durable, low-cost for data archiving
  • retrieval times configurable from minutes to hours.
  • super cheap

S3 Glacier Deep Archive

  • lowest cost storage class
  • retrieval time of 12 hours is acceptable → (e.g: you put request and it will get back to you after 12 Hours)

How Does S3 charge me ?

  • Storage space
  • number of requests
  • storage management pricing
  • data transfer pricing
  • transfer acceleration
  • cross region replication, if cross region options is enabled that means if you upload a file to a bucket in NY the data will be replicated on the other buckets.

How secure is S3 ?

First of all, all buckets newly created are PRIVATE by default that means now one can access and file on the bucket without allowing it.

S3 buckets can be configured to create access logs and the logs can be stored in completely another bucket or even in another AWS account.

There is multiple method to encrypt S3 data:

  1. Encryption in transit is achieved by SSL/TLS

  2. Encryption at rest (server side) is achieved by multiple methods

    • S3 Managed Keys (SSE-S3) where keys are managed by S3
    • AWS Key management service, managed Keys (SSE-KMS) here they keys are managed by User and AWS
    • Server side encryption with customer provided keys (SSE-C) where AWS can handle the encryption using user keys.
  3. Client side encryption which can be achieved by encrypting the object then upload it to S3

S3 lifecycles and Versioning

S3 by default does not enable versioning for the bucket it should be enabled by the user.

Enable versioning will store all objects including all writes and even if you delete objects. Versioning can not be disabled, only suspended. Suspended versioning means it will stop creating versions for the upcoming objects but will keep all the existing versions. each version has its own public/private access settings. On object deletion all versions will stay but it will get a delete marker and will be hidden from the view, to restore the object basically delete the delete marker.

S3 provide lifecycle rules for buckets, which give you the power to automate moving objects between different storage classes (can be used in conjunction with versioning)

Lifecycle examples:

  • move files to another storage class after 30 days.
  • move files to another storage class after 60 days.
  • after 30 days been overwritten delete the versions of that object

Object Lock and Glacier Vault Lock

S3 object lock allows you to store objects using WORM model (write once, read many), it can helpers to prevent objects from being deleted or modified for fixed amount of time or indefinitely, also can be used to meet regulatory requirements or add extra layer of protection from changes and deletions, which is has multiple mods.

  • Governance mode
    • user can not overwrite or delete an object version or alter its lock settings unless they have special permissions
    • Protect from deletion from the majority of the users.
    • We can still grant some users permission to alter the retention settings or delete the object if necessary.
  • Compliance mode
    • Protected object can not be overwritten or deleted by any users, including the root user of the account

    • Its retention mode can’t be changed and the retention period can’t be shortened.

    • ensure an object version can’t be overwritten or deleted for the duration of the retention period

Object lock retention period protects an object versioning for a fixed amount of time, so when you place your retention period on an object version, S3 stored a timestamp in the objects version metadata to indicate when the retention period is going to expire ad then after the mention period expires, the object version can be overwritten or deleted, unless you’ve also placed a legal hold on the object’s version.

Legal hold prevent version object from being overwritten or deleted, legal hold has no retention period, it remains in effect until its removed by any user who has the S3:PutObjectLegalHold permission

Performance

AWS S3 has extremely low latency, you get the first byte out of S3 within 100-200 milliseconds. S3 can apply 3500 PUT/COPY/POST/DELETE and 5500 GET/HEAD requests per second per prefix, prefix is the middle area between the bucket name and the object name.

** Tips to improve S3 performance **

  1. Spread the read across the prefixes → the more prefixes we have the better performance we get

  2. Uploading

    • Use multipart upload when you can, however its recommended for files over 100MB and required for files over 5GB
    • S3 allows you to parallelize uploads for more efficiency, it will split data into parts the upload them using parallelize uploads
  3. Downloads

    • Parallelize downloads by specifying byte ranges, and if there is a failure in the download, it’s only for specific byte range.
    • S3 byte range fetch can be used to speed up downloads or maybe to download a part of a file.

Keep in mind SSE-KMS (Server-Side Encryption - Key management system) limits when you encrypt the object, when you upload a file you will call GenerateDataKey in the KMS API and when you download a file you will call decrypt in the KMS API. Uploading/downloading will count towards the KMS quota. And the quota is region specific, however it’s either 5500, 10000, 30000 requests per second. Currently you can not request quota increase for KMS

S3 transfer acceleration utilized the cloud front edge network to accelerate your uploads to S3, which means instead of uploading directly to S3 bucket, you can use distinct URL to upload directly to a edge location which will the transfer the files directly to S3 bucket, the distinct url should be similar to xxxxxx.s3-accelerate.amazonaws.com

Select and Glacier Select

S3 select enables application to retrieve only a subset of data from an object by using simple SQL expressions, select will give you the power to get only the data needed for the application which can lead to performance increase in many cases up to 400% improvement and 80% cheaper. In the other hand glacier select is used by high regulated companies (financial services, healthcare, others), write data directly to Amazon Glacier to satisfy compliance needs like SEC rule 17a-4 or HIPAA. Glacier select allows you to run SQL queries against glacier directly.

Cross region replication

  • versioning must be enabled on both source and destination buckets.
  • Files in an existing bucket are not replicated automatically.
  • all subsequent updated files will be replicated automatically
  • delete markers are not replicated.
  • deleting individual versions or delete markers are not replicated.
  • object permission is replicated across all regions

Transferring data to S3

There are multiple options to move your data to S3 other than basic uploading.

DataSync agent is deployed as an agent one a server and connected to your NAS or file-system to copy data to AWS and write data from AWS, it will automatically encrypt data and accelerate transfer over the Wan, also performs automatic data integrity checks in-transit and at-rest.

Snowball is a petabyte-scale data transport solution that users secure appliances to transfer large amounts of data into and out of AWS. Snowball is 50TB or 80TB size, it uses multiple layers of security to protect you data including tamper-resistant enclosures, 256-bit encryption, and an industry-standard trusted platform module (TPM) designed to ensure both security and full chain-of-custody of your data. Once the data transfer job has been processed and verified, AWS performs a software ensure of the snowball appliance.

Snowball edge is a 100TB data transfer device with on-board storage capabilities.

Snowmobile is an exabyte-scale data transfer service used to move extremely large amounts of data to AWS. You can transfer up to 100PB snowmobile, a 45-foot long ruggedized shipping container, pulled by semi-trailer truck.