Development Operations Engineer

Nextlink Internet

Location

Hudson Oaks, TX

Salary

Not listed

Type

Full-Time

Experience

Entry Level

Required Skills

pythonsql

Job Description

**Company Overview**


We are a Hudson Oaks, Texas\-based Internet Service Provider (ISP) delivering High Speed Internet and Voice


Services throughout multiple states to residential, business, K\-12 Education and government customers. We


believe there is much more to an internet company than just delivering cost\-effective internet solutions; we


believe in delivering an overall customer experience that our competitors simply cannot match.




**Job Summary:**


Design, build, and operate Nextlink’s CI/CD, GitOps, container, and infrastructure\-as\-code platforms across


on‑prem datacenters and public cloud. Partner with Engineering, Field, NOC, and Security to automate


workflows, improve reliability, and accelerate delivery for customer\-impacting services. This role also serves


as a subject matter expert for network and infrastructure devices\-owning monitoring device models, standards,


and pre‑production testing aligned to the Nextlink launchpad process. Current stack includes GitHub/GitLab


CI, Terraform/Ansible, Docker/Kubernetes, and Grafana/Prometheus/ELK.




**Responsibilities:**


Reasonable accommodations will be made to enable individuals with disabilities to perform the essential


functions.


**Monitoring Device Models \& Telemetry:**


* Develop, test, and maintain up‑to‑date device models for Nextlink monitoring systems (SNMP, API,

streaming telemetry).


* Collaborate with Engineering, Field, and NOC to ensure correct monitoring, data collection, thresholds,

and alert standards.




**Network Monitoring Systems (NMS) – Zabbix (Design \& Architecture):**


* Own platform architecture for Zabbix (server, proxies, and backing database\-e.g., PostgreSQL with

time\-series extension) including HA/failover, housekeeping, and retention policies.


* Create and maintain device templates (SNMPv3, API, JMX/IPMI/SSH as applicable) with low\-level

discovery (LLD), item/trigger prototypes, macros, preprocessing, and escalation logic that match


Nextlink standards.


* Distributed monitoring at scale: design proxy placement and discovery to cover POPs/datacenters and

edge sites; ensure secure comms (TLS, PSKs/certs) and reliable buffering.


* Alert quality \& noise reduction: implement trigger dependencies, event correlation, maintenance

windows, and SLA/service maps; tune thresholds from SLOs and NOC feedback.


* Automation \& “Zabbix\-as\-Code”: manage templates, host onboarding, actions, and maintenance via the

Zabbix API and Git\-based workflows; integrate with CI/CD to promote monitoring changes through


environments.


* Integrations: connect Zabbix to ChatOps (Teams/Slack), ticketing, and paging; publish dashboards for

NOC/leadership; export metrics/events to your observability stack where useful.


* Security: enforce RBAC, SNMPv3, secret rotation, and least\-privilege API tokens; document and test

upgrades and rollbacks for zero/minimal downtime.




**Automation \& Network Change:**


* Identify, develop, and maintain scripts/tools to automate processes and network/device changes (Python,

Bash, PowerShell).


* Enforce configuration baselines, drift detection, and golden‑config rollouts; integrate change control and

approvals.




**CI/CD, GitOps \& Release Engineering:**


* Build and maintain CI/CD pipelines (reusable templates, quality gates, artifact/versioning, blue/green \&

canary).


* Implement GitOps for Kubernetes and network automation (Argo CD/Flux) using Helm/Kustomize and

policy controls.


* Support ephemeral environments, infrastructure testing, and progressive delivery with feature flags as

applicable.




**Infrastructure as Code, Datacenter \& Cloud:**


* Plan, deploy, and maintain physical servers and datacenter assets (capacity, ordering, lifecycle,

firmware).


* Provision cloud resources (Azure, AWS, GCP) using Terraform with least‑privilege identities and

tagging/FinOps standards.


* Implement secure networking (VNet/VPC, private endpoints, peering, DNS/TLS, load balancing,

WAF).




**Observability \& SRE:**


* Own metrics, logs, traces, and profiling via Prometheus/Grafana, and ELK; leverage eBPF where

appropriate.


* Define SLIs/SLOs, manage error budgets, and lead incident response/post‑incident reviews alongside

the NOC.




**Security, Compliance \& Supply Chain:**


* Embed DevSecOps: secret rotation, workload identity federation (OIDC), and least privilege across

platforms.


* Establish software supply‑chain controls: SBOM (CycloneDX), image signing (Sigstore cosign),

provenance (SLSA), and policy‑as‑code (OPA/Kyverno).


* Automate vulnerability management, patching, and CIS/NIST\-aligned hardening.



**AIOps \& ChatOps:**


* Integrate AIOps for anomaly detection, noise reduction, and incident summarization; apply LLMs to

enhance runbooks and root‑cause hypotheses.


* Implement ChatOps for deployments, rol

Posted: 2026-03-31