LLM Infrastructure Engineer

Amsys Innovative Solutions LLC

Location

Houston, TX

Salary

Not listed

Type

Full-Time

Experience

Entry Level

Required Skills

python

Job Description

We are looking for a Senior Python / AI API Engineer to build and deploy production\-grade services powering Large Language Model (LLM) applications. This role focuses on developing high\-performance APIs for model inference, optimizing GPU workloads, and deploying AI services in cloud environments.

This is an engineering\-focused role, not research. We are looking for someone who has built and shipped AI systems into production and understands the challenges of scalable inference and model serving.

**Key Responsibilities**

* Develop high\-performance APIs using Python (3\.10\+) and FastAPI
* Build and deploy LLM inference services using HuggingFace Transformers and PyTorch
* Optimize GPU workloads and CUDA memory usage
* Implement streaming inference APIs for real\-time model responses
* Containerize and deploy services using Docker and GPU\-enabled infrastructure
* Deploy AI workloads in Azure environments (AKS, ACI, or Container Apps)

**Required Skills**

* Strong Python development experience (3\.10\+)
* Hands\-on experience building production APIs with FastAPI
* Experience with HuggingFace Transformers and PyTorch
* Solid understanding of REST API design
* Experience deploying containerized applications with Docker

**Nice to Have**

* Experience with OpenAI\-compatible APIs, vLLM, or Text Generation Inference (TGI)
* Experience deploying AI workloads on Azure GPU infrastructure
* Familiarity with LoRA / PEFT fine\-tuning
* Exposure to legal or financial NLP use cases

**Ideal Candidate:** A hands\-on engineer who understands how LLM systems run in production\-from model loading and tokenization to GPU deployment and scalable APIs.

Posted: 2026-03-30

View Original Posting → Match Your Resume →