Job Description
About Fusemachines
Fusemachines is a 10+ year old AI company, dedicated to delivering state-of-the-art AI products and solutions to a diverse range of industries. Founded by Sameer Maskey, PhD, an Adjunct Associate Professor at Columbia University, our company is on a steadfast mission to democratize AI and harness the power of global AI talent from underserved communities. With a robust presence in four countries and a dedicated team of over 400 full-time employees, we are committed to fostering AI transformation journeys for businesses worldwide. At Fusemachines, we not only bridge the gap between AI advancement and its global impact but also strive to deliver the most advanced technology solutions to the world.
About the role:
This is a remote 3 months contract position responsible for designing, building, maintaining, and optimizing the infrastructure required for data integration (batch and real-time), storage (including databases and data modeling), processing, and analytics (BI, visualization and Advanced Analytics) using Microsoft Azure in the Media Industry (advertising, marketing and public relationship).
The Data Engineer works closely with cross-functional teams supporting business objectives, and serves as a Azure Cloud solutions subject matter expert (SME) on business logics and collaborates with the Solutioning team on solution design.
Qualification & Experience
- Must have a full-time Bachelor's degree in Computer Science or similar from a top-tier school.
- At least 5 years of experience as a data engineer, ETL development and database management, with strong expertise in Azure, working in Media Industry experience preferred.
- Strong expertise in Scala programming, with a specific focus on working with Apache Spark for large-scale data processing on Azure Synapse.
- 5+ years of experience with Azure DevOps, Azure Cloud Platform, and other hyperscalers.
- Proven experience delivering projects for Data and Analytics tools and technologies, as a data engineer.
- Experience with delivering on business application data requirements.
- Expertise in SQL and building ETL pipelines, including experience with: Azure Data Factory, Azure Databricks, Azure Stream Analytics, Azure Event Hubs.
- Expertise in databases and data warehousing; including experience with: Kimball Methodology, Azure SQL, Azure Synapse.
- Experience working on a Scrum Team in an Agile environment and following CI/CD principles.
- Following certifications (Nice to Have):
- Microsoft Certified: Azure Fundamentals
- Microsoft Certified: Azure Data Engineer Associate
- Databricks Certified Associate Developer for Apache Spark
- Databricks Certified Data Engineer Associate
Required skills/Competencies
- Highly proficient in Scala and Spark SQL, with advanced coding techniques for data integration, storage, processing, and optimization, specializing in developing high-performance Scala code for data engineering and analytics applications.
- Proficient in database design, advanced SQL, and optimization techniques.
- Deep knowledge of SDLC, Agile, and DevOps principles, with experience in Azure DevOps, GitHub, CI/CD, and IaC.
- Strong in Microsoft Azure data and analytics tools (Data Factory, Databricks, Synapse Analytics, PowerBI, etc.) and cloud security practices.
- Expertise in data modeling, database design, data warehousing, and ETL/ELT frameworks in Azure.
- Experienced in scalable data processing (Scala Spark, Event Hub) and orchestration (Apache Airflow).
- Proficient in troubleshooting, debugging, data quality, and governance.
- Effective communicator, collaborator, and leader, with strong problem-solving and strategic thinking.
- Committed to staying updated with Azure, data engineering trends, and best practices.
- Highly organized, agile, adaptable, and detail-oriented, with a focus on continuous learning.
Responsibilities
- Design, develop, and maintain scalable data architectures and pipelines in Azure using Scala Spark, Synapse, Data Factory, and Databricks
- Design, develop, and maintain scalable data architectures and pipelines using Azure Data Factory, Databricks, and Synapse Analytics.
- Ensure efficient batch and real-time data integration, storage, and processing for reliable ETL/ELT operations.
- Manage and optimize Azure database systems, structured/unstructured data in Data Lake Storage, and database schemas.
- Develop real-time solutions with Azure Stream Analytics, Event Hubs, and Functions; build batch processes for large datasets.
- Serve as an SME on business logic, code quality, and performance standards (Scala Spark, Spark SQL).
- Collaborate with Solutioning, Product, and Engineering teams to align data solutions with business needs.
- Mentor junior engineers and foster team growth.
- Implement data governance, quality assurance, validation, and security frameworks.
- Deliver on analytics needs (descriptive, diagnostic, predictive, prescriptive).
- Engage in Agile processes, continuous improvement, and stay updated on industry trends and best practices.