This client is modern, in the public cloud (AWS), micro-services based large distributed system and is quality driven. Their stack consists of Scala, Play, Akka, MongoDB, PostgreSQL, Docker, Mesos, Marathon, Jenkins, Kafka, Spark, HDFS etc. They believe in continuously evaluating our tech stack, have an easy process of suggesting and adapting new technologies. Our engineers enjoy being empowered and accountable.
We are looking for an experienced Hadoop developer who can build and troubleshoot Big Data pipelines in a Hadoop environment.
Hands on development and maintenance of the Cloudera-based Big Data platform
Create data pipelines to extract and cleanse data from a variety of sources and formats for the purposes of reporting and analytics
Build tools and frameworks to enable data flow patterns
Work closely with Product and Project Managers to understand the features, do a technical assessment, design, code test and deliver
Strong development skills around Hadoop, Spark, Hive, Kafka and Airflow
Strong SQL, Python and shell scripting
Strong understanding of Hadoop internals
Experience with AWS components and services, particularly S3
Bachelor’s degree in Computer Science or related fields
Experience building ETL and familiarity with design principles
3 - 5 years of programming experience in Java, Scala or Python
Cloudera experience is a plus
Understanding of data security principles and Kerberos is a big plus.