Design, develop, and maintain scalable data pipelines and ETL processes to support data analytics and machine learning initiatives.
Utilize workflow management platforms to orchestrate and automate data engineering pipelines.
Implement data models and schemas to support data storage, retrieval, and analysis.
Work closely with data scientists and analysts to understand data requirements and provide data engineering solutions.
Collaborate with software engineers to integrate data pipelines with existing systems and applications.
Implement data quality monitoring and validation processes to ensure data integrity and accuracy.
Optimize data infrastructure and performance to meet scalability and performance requirements.
Setup instrumentation and monitoring to track production health parameters and proactively alert in case of any performance degradation.
Requirements
Proficiency in Python for scripting and data manipulation.
Solid understanding of data modeling concepts and database technologies (e.g., SQL, NoSQL, relational databases).
Experience working with monitoring tools such as Datadog is a plus. . Experience with data warehouse solutions such as Snowflake, Redshift, or BigQuery.
Strong experience with the Google Cloud platform and its respective data services (BigQuery, Airflow).
Strong experience working with relational databases such as PostgreSQL.
Excellent problem-solving skills and attention to detail.
Strong communication and collaboration skills, with the ability to work effectively in a team environment or Certifications.]
Example: Excellent verbal and written communication skills