Data Engineer - US Hybrid

Siemens

Location

Huntsville, TX

Salary

$109,800 - $197,700

Type

Full-Time

Experience

Entry Level

Required Skills

pythonsql

Job Description

We are a leading global software company dedicated to the world of computer aided design, 3D modeling and simulation— helping innovative global manufacturers design better products, faster! With the resources of a large company, and the energy of a software start\-up, we have fun together while creating a world class software portfolio. Our culture encourages creativity, welcomes fresh thinking, and focuses on growth, so our people, our business, and our customers can achieve their full potential.


Experience Level: 8yrs enterprise data engineering


About the Role


Seeking a highly skilled and experienced Data Engineer to join our growing data team. The ideal candidate will be a technical specialist who is passionate about designing, building, and optimizing scalable, reliable, and high\-performance data infrastructure. This role is crucial in architecting our next\-generation data platform to unify data warehousing and data lake capabilities.


You will be responsible for creating robust data pipelines, managing diverse database technologies, and ensuring high data quality for our Data Scientists, Analysts, and business stakeholders.


Key Responsibilities


Data Engineering \& Architecture


* Design, implement, and optimize the overall data architecture, with a strong focus on the Lakehouse paradigm (e.g., using Databricks/Delta Lake, Microsoft Fabric, or equivalent cloud\-native solutions).
* Develop and manage data models (dimensional, relational, or NoSQL) for both transactional and analytical systems, ensuring efficiency and scalability.
* Successfully migrate or integrate data from legacy systems and disparate sources into the modern Lakehouse environment.
* Monitor, tune, and optimize data storage, compute costs, and query performance across the data platform.


Data Pipeline Development (ETL/ELT)


* Design, build, and maintain robust, scalable, and fault\-tolerant ETL/ELT data pipelines for batch and real\-time data ingestion and transformation.
* Integrate data from a variety of sources, including transactional databases, APIs, message queues (e.g., Kafka), and external SaaS platforms.
* Implement data quality checks, validation rules, and data governance policies within the pipelines to ensure data reliability and compliance.
* Use workflow orchestration tools (e.g., Apache Airflow, Azure Data Factory, AWS Glue) to automate and manage complex data workflows.


Database Management


* Demonstrate strong working knowledge of and hands\-on experience with various database management systems (DBMS).
* + Relational Databases (SQL): PostgreSQL, MySQL, SQL Server, or cloud\-based relational services (e.g., AWS RDS, Azure SQL Database).
+ NoSQL Databases: Experience with one or more NoSQL types (e.g., Document\-based like MongoDB/CosmosDB, Key\-Value, Graph, or Columnar databases like Cassandra and Neo4J).
* Optimize database schemas and write complex, efficient SQL queries and stored procedures for data manipulation and retrieval.


Collaboration \& Operations


* Collaborate closely with Data Scientists and Data Analysts to deliver high\-quality, feature\-rich datasets that support advanced analytics and Machine Learning (ML) models.
* Establish and maintain Continuous Integration/Continuous Deployment (CI/CD) practices for all data\-related infrastructure and code.
* Develop comprehensive technical documentation on data pipelines, data models, and platform architecture.
* Ensure data security, access control, and compliance with data privacy regulations (e.g., GDPR, HIPAA).


Required Skills and Qualifications


* At least Bachelors in Computer Science or equivalent
* 6\-8 years of hands\-on experience in a dedicated Data Engineering role.
* Expert\-level proficiency in SQL and at least one high\-level programming language, such as Python or Scala, used for data manipulation and engineering tasks.
* Proven experience in designing and managing data platforms using a Lakehouse architecture (e.g., Databricks/Delta Lake, Apache Hudi, Apache Iceberg, or similar cloud\-native lakehouse services).
* Solid understanding of cloud platforms: Azure, AWS or GCP and their relevant data services (e.g., S3/ADLS/GCS for storage, Spark services) preferably Azure.
* In\-depth knowledge of database fundamentals, including schema design, performance tuning, and practical experience with both Relational and NoSQL databases.
* Familiarity with distributed processing frameworks (e.g., Apache Spark) for handling large\-scale data transformation.
* Experience implementing and maintaining automated ETL/ELT data pipelines and utilizing data orchestration tools.
* Strong understanding of data modeling techniques (e.g., Star Schema, Data Vault).
* Familiarity with MLOps


Preferred Qualifications


* Experience with real\-time streaming technologies (e.g., Apache Kafka, Kinesis, Pub/Sub).
* Familiarity with Infrastructure as Code (IaC) tools like Terraform.
* Experience in MLOps and serving production\-ready data to ML syst

Posted: 2026-03-24