3+ years of experience in building and operating data pipelines in Apache Spark or Apache Flink.
2+ years of experience with workflow orchestration tools like Apache Airflow, Dagster.
Proficient in Java, Maven, Gradle and other build and packaging tools.
Adept at writing efficient SQL queries and trouble shooting query plans.
Experience managing large-scale data on cloud storage.
Great problem-solving skills, eye for details. Can debug failed jobs and queries in minutes.
Operational excellence in monitoring, deploying, and testing job workflows.
Open-minded, collaborative, self-starter, fast-mover.
Hands-on experience with k8s and related toolchain in cloud environment.
Experience operating and optimizing terabyte scale data pipelines
Deep understanding of Spark, Flink, Presto, Hive, Parquet internals.
Hands-on experience with open source projects like Hadoop, Hive, Delta Lake, Hudi, Nifi, Drill, Pulsar, Druid, Pinot, etc.
Operational experience with stream processing pipelines using Apache Flink, Kafka Streams.