Data Engineer

Kaizen Global Technologies

Design, develop, and maintain scalable batch and real-time data ingestion frameworks using AWS services, Python, PySpark, and SQL.
Build, orchestrate, and optimize end-to-end data pipelines using AWS Managed Apache Airflow (MWAA), Amazon Redshift, AWS Glue, and Amazon SageMaker Unified Studio.
Develop low-latency streaming solutions using AWS Kinesis, Apache Kafka, or similar technologies, ensuring high availability, performance, and reliability.
Implement data transformation, validation, monitoring, governance, and observability solutions while ensuring security through AWS IAM, VPC, and CloudWatch.
Collaborate with cross-functional teams to design data lakehouse and data warehouse solutions, automate CI/CD deployments, and deliver scalable, reusable data engineering frameworks.

Strong hands-on experience with AWS, Python, PySpark, SQL, Amazon Redshift, Apache Airflow (MWAA), and AWS SageMaker Unified Studio.
Expertise in building batch and real-time data ingestion pipelines using AWS Kinesis, Apache Kafka, AWS Glue, APIs, databases, and file systems.
Experience with relational and NoSQL databases such as Oracle, Teradata, Amazon RDS, DynamoDB, MongoDB, Snowflake, including data modeling and query optimization.
Solid understanding of data warehouse, data lakehouse architectures, ETL/ELT processes, workflow orchestration, monitoring, CI/CD, Git, CloudFormation, and Terraform.
Knowledge of machine learning pipeline orchestration, SageMaker IDE (JupyterLab, Spaces), data governance, performance optimization, and cloud security best practices (IAM, VPC, CloudWatch, S3, Lambda, API Gateway, SQS/SNS).

Please drop your CV to ***email_hidden***