Job Description
About Fusemachines
Fusemachines is a leading AI strategy, talent, and education services provider. Founded by Sameer Maskey Ph.D., Adjunct Associate Professor at Columbia University, Fusemachines has a core mission of democratizing AI. With a presence in 4 countries (Nepal, United States, Canada, and Dominican Republic and more than 350 full-time employees). Fusemachines seeks to bring its global expertise in AI to transform companies around the world.
About the role
This is a remote, 3-month contract role. We are seeking a Senior Data Quality Engineer with expertise in testing data pipelines, cloud infrastructure, and ETL processes on the Google Cloud Platform (GCP). The ideal candidate will lead quality assurance initiatives for complex data processing systems, implement automated testing frameworks, and establish QA best practices for the project.
As a Senior Data Quality Assurance Engineer, you will be responsible for ensuring the accuracy, completeness, reliability and efficiency of data movement and transformation processes and you will collaborate with cross-functional teams, including data engineers and data architects to validate and improve data pipelines, identify issues, and implement quality assurance measures and processes.
Preferred Qualifications:
- Must have a full-time Bachelor's degree in Computer Science, Engineering or similar, e.g. Statistics, Mathematics, from a top-tier school.
- 8+ years of QA experience, with at least 3 years focusing on data platforms, working on big datasets using different data sources.
- Proven experience delivering projects and products for Data and Analytics as a data quality engineer.
- Following certifications:
- Google Cloud Certifications.
Required skills/Competencies
- 3+ years of experience with GCP Cloud Platform.
- Minimum 8 years of relevant experience in Data Quality Assurance or a similar role, preferably within a data-driven environment, focusing on ETL/ELT, Data Testing, and BI processes.
- Demonstrated expertise in Python language and its testing frameworks, such as pytest and unit test, for developing efficient automated tests.
- Proficient in SQL and data validation techniques to ensure data accuracy and consistency.
- Skilled in API testing tools, including Postman and REST-assured, for verifying API functionality and reliability.
- Experienced in performance testing tools like JMeter and Locust to assess system stability and response under load.
- Hands-on experience with GCP services, including BigQuery, Cloud Composer, and Dataflow, to support cloud-based testing and data integrity.
- Extensive experience in testing ETL processes and data pipelines to ensure accurate data transformation and flow.
- Knowledgeable in data warehousing solutions to verify efficient data storage and retrieval.
- Strong understanding of data quality principles and governance to uphold data integrity across systems.
- Proficient in version control systems like Git and CI/CD tools for seamless code management and deployment.
- Experienced with infrastructure testing tools such as Terraform and CloudFormation for reliable and secure infrastructure verification.
- Skilled in container testing with Docker and Kubernetes to ensure consistency within containerized environments.
- Knowledgeable in security testing tools and methodologies to identify and mitigate system vulnerabilities.
- Proficient in test automation frameworks like Selenium and Cypress to streamline and automate testing.
- Understanding of data compliance standards like GDPR and CCPA to maintain data privacy and regulatory adherence.
- Experienced with agile testing methodologies to support flexible and responsive testing within agile development processes.
- Test strategy development with risk-based testing, ensure comprehensive test coverage, test metrics, and reporting, and establish quality gates to maintain standards.
- Develop validation frameworks for data accuracy, verify schemas, test data transformations, and perform integrity checks to ensure data consistency.
- Conduct load, stress, and scalability testing to assess system performance, and monitor for performance bottlenecks.
- Validate access controls, test data encryption, and ensure compliance with security regulations.
- Solid understanding of SDLC, STLC, BDD, Bug/Defect Life Cycle.
- Experience using test management and defect tracking platforms such as JIRA.
Other Skills
- Strong leadership and mentoring abilities to guide teams, foster collaboration, and align everyone with project goals.
- Excellent analytical and problem-solving skills to identify, analyze, and resolve complex issues efficiently and innovatively.
- Clear communication of technical concepts, simplifying complex ideas for both technical and non-technical stakeholders.
- Proficient in stakeholder management, building strong relationships, addressing concerns, and maintaining transparent communication throughout projects.
- Expertise in documentation, ensuring accurate and thorough records that clearly communicate processes, requirements, and outcomes.
- Attention to detail, ensuring high-quality, consistent, and accurate work.
- Strong team collaboration skills to work effectively within cross-functional groups, fostering a collaborative, knowledge-sharing environment.
Responsibilities
- Lead, develop and execute comprehensive test plans, test cases, test scripts and scenarios to validate the functionality, reliability, performance, accuracy and integrity of data, for ETL/ELT (pipelines) and cloud infrastructure, and contribute to improvements to frameworks, tools, processes, and best practices within a timely manner.
- Lead testing efforts for critical data processing systems built on GCP services including:
- Oversee the testing of Cloud Composer workflows and Directed Acyclic Graphs (DAGs), ensuring their reliability, efficiency, and accuracy in orchestrating data workflows.
- Lead testing efforts for BigQuery data warehousing solutions to verify data accuracy, query performance, and overall data handling within cloud environments.
- Manage testing for ETL processes and data transformations, ensuring the proper extraction, transformation, and loading of data across systems
- Ensure testing of data governance and security measures to validate compliance, data privacy, and secure access control.
- Develop and maintain automated test frameworks for: Data quality validation, Pipeline performance testing, Integration testing of cloud services and End-to-end testing of data workflows.
- Create and execute test plans:
- To verify the accuracy, consistency, and integrity of data across systems.
- To assess the system's ability to scale and maintain performance under varying conditions.
- To evaluate security protocols and access controls, ensuring systems remain secure and compliant with privacy regulations.
- To validate disaster recovery strategies, ensuring systems can recover quickly and effectively in case of failure.
- Oversee the management of test environments and ensure the generation of appropriate test data to accurately simulate production scenarios for thorough testing.
- Integrate continuous testing practices into CI/CD pipelines, enabling automated and ongoing validation of data pipelines.
- Perform root cause analysis of data discrepancies, propose solutions, and collaborate with cross-functional teams to ensure data quality and integrity.
- Design, implement, and maintain automated testing frameworks and tools to streamline the QA process and improve efficiency.
- Monitor changes to ELT/ETL and BI tools, ensuring no negative impact on data integrity, and contribute to continuous improvements in QA processes.
- Collaborate with stakeholders to define data quality standards, guidelines, and metrics, and guide team members on data quality-related issues.
- Participate in all phases of the software development life cycle, including test planning, bug tracking (e.g., Jira), and process improvements to enhance testing efficiency and automation.