AWS

3 min readFeb 6, 2021

Data engineering

Storage topics

S3 data lakes: upload file, do partitioning

S3 storage Tiers (Frequent->infrequent use)

S3 Lifecycle rules: move to different storage classes, General Purpose -> Infrequent Access ->Glacier(never use), Transition actions(60 days, 6 months), expiration actions (delete)

S3 Security: User based/Resource Based32w

DynamoDB

Data Transformation

Glue
Glue ETL (The underlying platform for Glue ETL is a serverless Apache Spark Platform)

Streaming

Kinesis: alternative to Kafka, great for application logs, metrics, IoT, clickstreams, great for real time big data
Kinesis Video Streams: streaming video in real time

-Streams are divided in ordered shards/partitions. Shards have to be provisioned in advance (capacity planning).

-Ability to reprocess/replay data

-multiple applications can consume the same stream

-fast record, small in size

-Data ingestion into redshift/amazon s3/elasticsearch/splunk

-Automatic scaling