Performance Engineer
Company: Etched
Location: San Jose
Posted on: April 2, 2026
|
|
|
Job Description:
About Etched Etched is building the world’s first AI inference
system purpose-built for transformers - delivering over 10x higher
performance and dramatically lower cost and latency than a B200.
With Etched ASICs, you can build products that would be impossible
with GPUs, like real-time video generation models and extremely
deep & parallel chain-of-thought reasoning agents. Backed by
hundreds of millions from top-tier investors and staffed by leading
engineers, Etched is redefining the infrastructure layer for the
fastest growing industry in history. Key responsibilities Develop
comprehensive performance models and projections for Sohu's
transformer-specific architecture across varying workloads and
configurations Profile and analyze deep learning workloads on Sohu
to identify micro-architectural bottlenecks and optimization
opportunities Build analytical and simulation-based models to
predict performance under different architectural configurations
and design trade-offs Collaborate with hardware architects to
inform micro-architectural decisions based on workload
characteristics and performance analysis Drive hardware/software
co-optimization by identifying opportunities where architectural
features can unlock significant performance improvements
Characterize and optimize memory hierarchy performance,
interconnect utilization, and compute resource efficiency Develop
performance benchmarking frameworks and methodologies specific to
transformer inference workloads Key Responsibilities Build detailed
roofline models and performance projections for Sohu across diverse
transformer architectures (Llama, Mixtral, etc.) Profile production
inference workloads to identify and eliminate micro-architectural
bottlenecks Analyze memory bandwidth, compute utilization, and
interconnect performance to guide next-generation architecture
decisions Develop performance modeling tools that predict chip
behavior across different batch sizes, sequence lengths, and model
configurations Characterize the performance impact of architectural
features like specialized datapaths, memory hierarchies, and
on-chip interconnects Compare Sohu's architectural efficiency
against conventional GPU architectures through detailed bottleneck
analysis Inform hardware design decisions for future generations
(next gen and beyond) based on workload analysis and performance
projections You may be a good fit if you have Deep expertise in
computer architecture and micro-architecture, particularly for
accelerators or domain-specific architectures Strong performance
modeling and analysis skills with experience building analytical or
simulation-based performance models Experience profiling and
optimizing deep learning workloads on hardware accelerators (GPUs,
TPUs, ASICs, FPGAs) Strong understanding of hardware/software
co-design principles and cross-layer optimization Solid foundation
in digital circuit design and how micro-architectural decisions
impact performance Experience with reconfigurable or heterogeneous
architectures Ability to reason quantitatively about performance
bottlenecks across the full stack from circuits to workloads Strong
candidates may also have PhD or equivalent research experience in
Computer Architecture or related fields Experience with ASIC, FPGA,
or CGRA-based accelerator development Published research in
computer architecture, ML systems, or hardware acceleration Deep
knowledge of GPU architectures and CUDA programming model
Experience with architecture simulators and performance modeling
tools (gem5, trace-driven simulators, custom models) Track record
of informing architectural decisions through rigorous performance
analysis Familiarity with transformer model architectures and
inference serving optimizations Benefits Medical, dental, and
vision packages with generous premium coverage $500 per month
credit for waiving medical benefits Housing subsidy of $2k per
month for those living within walking distance of the office
Relocation support for those moving to San Jose (Santana Row)
Various wellness benefits covering fitness, mental health, and more
Daily lunch dinner in our office How we’re different Etched
believes in the Bitter Lesson . We think most of the progress in
the AI field has come from using more FLOPs to train and run
models, and the best way to get more FLOPs is to build
model-specific hardware. Larger and larger training runs encourage
companies to consolidate around fewer model architectures, which
creates a market for single-model ASICs. We are a fully in-person
team in San Jose (Santana Row), and greatly value engineering
skills. We do not have boundaries between engineering and research,
and we expect all of our technical staff to contribute to both as
needed.
Keywords: Etched, Manteca , Performance Engineer, IT / Software / Systems , San Jose, California