Responsibilities
- Working with the Cloud and System Engineering teams in deploying new EMR environments or expand existing environments (ETL EMR, HBASE EMR, Presto EMR, NPIX EMR, Data Science EMR)
- Working with Application teams to set up new EMR Hadoop/Presto users
- Hadoop cluster maintenance as well as creation and removal of nodes using tools like Ansible
- Screening Hadoop cluster job performances and planning capacity
- Monitoring Hadoop cluster connectivity and security, managing and reviewing Hadoop log files
- Managing and monitoring Hadoop Distributed File System, providing support and maintenance
- Working with the Infrastructure, Network, Database, Application, and Business Intelligence teams to guarantee high data quality and availability
- Working with Application teams and external vendor partner resources to install operation systems, Hadoop updates, patches, and version upgrades when required
- Work with AWS to open support cases, follow up on issues resolution
- Monitor S3 usage and growth. Review and implement life cycle policies to manage S3 growth
- Provide user access management to EMR environments and S3
- Address connectivity issues and other performance issues raised by consumers of the platform
- Set up and maintain alerts using appropriate tools (Prometheus, Kibana, Cloud watch etc.)
- Take ownership of platform automation ansible scripts, enhance and maintain the scripts ( EMR configuration and/or other ) as needed
- Maintain and monitor the health of the cluster and work on creation and removal of nodes using native AWS EMR admin utilities and ANSIBLE