Lead RTL Design Engineer

nan

Location

Austin, TX

Salary

$160,000 - $250,000

Type

Full-Time

Experience

Senior Level

Job Description

Efficient is developing the world's most energy\-efficient general\-purpose computer processor. Efficient's patented technology uses 100x less energy than state of the art commercially available ultra\-low\-power processors and is programmable using standard high\-level programming languages and AI/ML frameworks. This level of efficiency makes perpetual, pervasive intelligence possible: run AI/ML continuously on a AA battery for 5\-10 years. Our platform's unprecedented level of efficiency enables IoT devices to intelligently capture and curate first\-party data to drive the next major computing revolution


We are looking for a Lead RTL Design Engineer to own microarchitecture definition and RTL implementation across the dataflow execution fabric, memory subsystem, on\-chip interconnect/NoC, low\-power logic, and standard peripheral IP (RiscV, NVM, I2S, I2C) integration. You will work from architecture spec through synthesis\-ready RTL, collaborating with architects, microarchitects, DV leads, physical design, and firmware teams to tape out an industry leading power\-efficient SoC.



This is a unique opportunity to be a part of a newly formed HW engineering org and have an influence on our products and processes as we move from the initial stages of product development to market release and scaled volume production. Join our team and help us shape the future of computing at the edge and beyond!


**Key Responsibilities**


* **Microarchitecture definition (core focus):** Own the design and definition of processor and compute\-unit microarchitecture, including dataflow pipelines, execution units, and interfaces. Set performance, power, and area targets, and guide the team toward achieving them.
* **On\-chip interconnects and system integration:** Define and drive the design of on\-chip networks and data movement across the fabric, balancing performance, scalability, and implementation constraints in collaboration with physical design.
* **Memory subsystem \& system architecture:** Define the interface to the memory subsystem, including data movement, ordering, and synchronization behavior, ensuring a clean and scalable model for software and future system expansion.
* **Reconfiguration and execution model:** Lead the architecture of configuration, scheduling, and execution of workloads on the fabric, including multi\-kernel support and interaction with host systems.
* **Power management and low\-power design:** Drive power architecture across the design, including clocking, reset, power domains, and low\-power strategies to meet aggressive energy and efficiency goals.
* **HW/SW co\-design:** Collaborate closely with compiler and software teams to define the hardware execution model, ensuring efficient mapping of workloads onto the architecture.


**Specifications, Documentations and Reviews: Author and own uArch specification documents for assigned blocks; drive design reviews with architecture, compiler, DV, and physical design stakeholders.**


* **Mentoring and process improvement:** Mentor senior and junior RTL engineers; review RTL, flag microarchitecture risks, and enforce coding style and lint\-clean standards across the team.
* **Driving PPA Metrics:** Participate in PPA analysis loops: synthesize blocks regularly, review area/timing/power reports, and make data\-driven tradeoffs against performance and feature requirements.
* **DV Collaboration:** Collaborate with DV leads to define/review verification plans; provide directed test scenarios for graph execution corner cases, back\-pressure conditions, and power state transitions.
* **Silicon Bring\-up:** Support silicon bring\-up: contribute scan/ATPG guidelines, review DFT insertion, and provide RTL\-level debug assistance during lab validation.


**Required Qualifications \& Experience**


* 8\+ years of RTL design experience with tape\-out ownership of dataflow based design, on chip networks, memory subsystems or peripheral integration on a processor or accelerator SoC.
* Deep proficiency in SystemVerilog for RTL — synthesis\-clean, lint\-clean, timing\-aware; able to design complex state machines, arbiters, token flow controllers, and datapath logic from scratch.
* Solid understanding of parallel execution models: dataflow, SIMD, or systolic array architectures; familiarity with the hardware challenges of token\-based firing\-rule evaluation and producer\-consumer synchronization.
* Hands\-on experience with on\-chip memory design: SRAM wrappers, scratchpad/TCM, banking, and memory\-mapped register interfaces.
* Experience with low\-power RTL techniques: UPF\-driven flows, clock gating, power domains, retention registers, and AON wakeup logic.
* Familiarity with at least one standard on\-chip bus protocol (AXI, AHB, APB, TileLink, or NoC equivalent) at the RTL implementation level.
* Experience taking RTL through synthesis and timing closure; ability to read and act on SDC constraints, STA reports, and synthesis QoR summaries.
* Strong written

Posted: 2026-04-30