## Notes - [[#Domain 1 28% Data Preparation for Machine Learning]] - 26% ML Model Development - 22% Deployment and Orchestration of ML Workflows - 24% ML Solution Monitoring, Maintenance, and Security ## Domain 1: 28% Data Preparation for Machine Learning - Data formats and ingestion mechanisms - validated formats - not-validated formats - Apache Parquet - JSON - CSV - Apache ORC - Apache Avro - RecordIO - Core AWS data services - S3 - Amazon Elastic File System (EFS) - Amazon FSx for NetApp ONTAP - AWS streaming services - Kinesis - Flink - Kafka - AWS storage options and tradeoffs - Extracting data from storage - S3 - Elastic Block Store (EBS) - EFS - RDS - DynamoDB - S3 Transfer Acceleration - EBS provisioned IOPS - Choosing appropriate data formats based on access patterns - Ingest data into SageMaker Data Wrangler and SageMaker Feature Store - Merging data from multiple sources (AWS Glue, Spark) ## References - [[AWS MOC]]