Purdue University logo

Purdue University

Senior Data Science Engineer

🇺🇸 West Lafayette, IN

🕑 Full-Time

💰 TBD

💻 Data Science

🗓️ October 7th, 2025

ETL MySQL PostgreSQL

Edtech.com's Summary

Purdue is hiring a Senior Data Science Engineer to develop scalable data pipelines and modeling workflows to support digital twins at the Birck Nanotechnology Center and SMART USA Institute partners. The role involves ensuring data quality, designing data architectures, implementing ontologies, and supporting advanced computational methods for research, education, and workforce development.

Highlights
  • Develop and maintain scalable data pipelines and workflows for digital twin data.
  • Design data architectures, ontologies, and knowledge graphs for data integration.
  • Prepare AI-ready datasets and optimize data workflows for analytics and dashboards.
  • Experience with relational databases such as MySQL, PostgreSQL, or MS SQL Server.
  • Proficient in Python, SQL, and data transformation tools.
  • Familiarity with visualization tools like Tableau or Power BI.
  • Bachelor's degree in Engineering, Computer Science, Physical Science, Data Science, or related field with 4+ years relevant experience.
  • Knowledge of data quality, lineage, governance frameworks, and FAIR data principles.
  • Helpful skills include ontology design, AI/ML data preparation, cloud platforms, and big data technologies.
  • Position has a 3-year duration, is exempt from overtime, and requires a background check; Purdue does not sponsor employment authorization.

Senior Data Science Engineer Full Description

Job Summary
The Network for Computational Nanotechnology (NCN) is charged with providing computational, modeling, and data infrastructure to support the creation of digital twins (DTs) by SMART USA Institute.  These services will support DTs for the Birck Nanotechnology Center (BNC) and other SMART USA partners for research, development, and education/workforce development (EWD) efforts.  The services will involve collecting data from the source (e.g. BNC equipment and simulations), making the data AI-ready following FAIR principles, seamlessly connecting the data to AI models and visualization to inform decision-making, and ultimately publication, storage, and sharing.  Importantly, decision making includes using real-time information from multiple sources to update digital twins and the use of their forecasting ability to provide feedback to experimentalists.  The work will leverage resources from Purdue's Rosen Center for Advanced Computing (RCAC) and nanoHUB.

The Senior Data Science Engineer will work with the NCN team to develop scalable data pipelines and modeling workflows, ensure data quality and governance, design data architectures, and implement ontologies and knowledge graphs.  The role will also prepare AI-ready datasets, optimize data workflows for analytics and dashboards, and support decision-making tools that leverage advanced computational methods.  The position will design, build, and maintain advanced data infrastructure supporting research, education, and workforce development initiatives.  This position will enable reliable data collection, storage, processing, and delivery for scientific, engineering, and AI/ML applications.  The position is expected to have a 3-year duration and be renewable.

What We're Looking For:
  • Bachelor's degree in Engineering, Computer Science, Physical Science, Data Science or a related field
  • Four or more years of experience in data engineering, database development, or datapipeline implementation, including:  building ETL/ELT pipelines to transform raw > ccleaned> enriched data, database schema design and management for relational or non-relational systems, implementing data quality, lineage, and governance frameworks.
  • Consideration will be given to an e quivalent combination of required education and related experience
  • Demonstrated experience with one or more relational database systems such as MySQL, PostgreSQL, or MS SQL Server
  • Strong programming skills in Python, SQL and related data transformation tools
  • Experience with data architecture design, partitioning, indexing, and retention policies for performance and scalability
  • Familiarity with visualization/BI tools such as Tableau or Power BI
  • Ability to collaborate with domain experts to define metadata plans and integrate diverse data sources
  • Knowledge of data lifecycle management, archival processes, and data security principles
  • Ability to quickly understand new technology requirements and demonstrate skills learned
  • Excellent oral, written, and computer communication skills with strong analytical and troubleshooting skills
  • Ability to multi-task on a variety of activities and work effectively on multiple deadline-driven tasks
  • Self-motivated with the ability to think and work independently
  • Demonstrated ability to work with others

What is Helpful:
  • Advanced degree in Engineering, Data Science, or Physical Sciences discipline
  • Experience designing and implementing ontologies, taxonomies, or knowledge graphs
  • Familiarity with AI/ML data preparation, including anomaly detection for process control data sets
  • Knowledge of FAIR (Findable, Accessible, Interoperable, Reusable) data principles
  • Background with cloud data platforms (AWS, AZure, GCP) or big data technologies (Spark, Hadoop)
  • Experience integrating agentic AI systems with controlled access to datasets and analytical tools
  • Ability to scope, evaluate and deploy commercial management or analytics solutions
  • Experience with large-scale scientific or engineering data workflows
  • Specialized skills such as: big data technologies, dynamic web programming, or speculative/exploratory data driven analysis

What We Want You To Know:
  • Purdue will not sponsor employment authorization for this position
  • A background check is required for employment in this position
  • FLSA: Exempt (Not eligible for overtime)
  • Retirement Eligibility: Defined Contribution Waiting Period
  • Purdue University is an EO/EA University.