Development Operations Engineer
Job Description
**Company Overview**
We are a Hudson Oaks, Texas\-based Internet Service Provider (ISP) delivering High Speed Internet and Voice
Services throughout multiple states to residential, business, K\-12 Education and government customers. We
believe there is much more to an internet company than just delivering cost\-effective internet solutions; we
believe in delivering an overall customer experience that our competitors simply cannot match.
**Job Summary:**
Design, build, and operate Nextlink’s CI/CD, GitOps, container, and infrastructure\-as\-code platforms across
on‑prem datacenters and public cloud. Partner with Engineering, Field, NOC, and Security to automate
workflows, improve reliability, and accelerate delivery for customer\-impacting services. This role also serves
as a subject matter expert for network and infrastructure devices\-owning monitoring device models, standards,
and pre‑production testing aligned to the Nextlink launchpad process. Current stack includes GitHub/GitLab
CI, Terraform/Ansible, Docker/Kubernetes, and Grafana/Prometheus/ELK.
**Responsibilities:**
Reasonable accommodations will be made to enable individuals with disabilities to perform the essential
functions.
**Monitoring Device Models \& Telemetry:**
* Develop, test, and maintain up‑to‑date device models for Nextlink monitoring systems (SNMP, API,
streaming telemetry).
* Collaborate with Engineering, Field, and NOC to ensure correct monitoring, data collection, thresholds,
and alert standards.
**Network Monitoring Systems (NMS) – Zabbix (Design \& Architecture):**
* Own platform architecture for Zabbix (server, proxies, and backing database\-e.g., PostgreSQL with
time\-series extension) including HA/failover, housekeeping, and retention policies.
* Create and maintain device templates (SNMPv3, API, JMX/IPMI/SSH as applicable) with low\-level
discovery (LLD), item/trigger prototypes, macros, preprocessing, and escalation logic that match
Nextlink standards.
* Distributed monitoring at scale: design proxy placement and discovery to cover POPs/datacenters and
edge sites; ensure secure comms (TLS, PSKs/certs) and reliable buffering.
* Alert quality \& noise reduction: implement trigger dependencies, event correlation, maintenance
windows, and SLA/service maps; tune thresholds from SLOs and NOC feedback.
* Automation \& “Zabbix\-as\-Code”: manage templates, host onboarding, actions, and maintenance via the
Zabbix API and Git\-based workflows; integrate with CI/CD to promote monitoring changes through
environments.
* Integrations: connect Zabbix to ChatOps (Teams/Slack), ticketing, and paging; publish dashboards for
NOC/leadership; export metrics/events to your observability stack where useful.
* Security: enforce RBAC, SNMPv3, secret rotation, and least\-privilege API tokens; document and test
upgrades and rollbacks for zero/minimal downtime.
**Automation \& Network Change:**
* Identify, develop, and maintain scripts/tools to automate processes and network/device changes (Python,
Bash, PowerShell).
* Enforce configuration baselines, drift detection, and golden‑config rollouts; integrate change control and
approvals.
**CI/CD, GitOps \& Release Engineering:**
* Build and maintain CI/CD pipelines (reusable templates, quality gates, artifact/versioning, blue/green \&
canary).
* Implement GitOps for Kubernetes and network automation (Argo CD/Flux) using Helm/Kustomize and
policy controls.
* Support ephemeral environments, infrastructure testing, and progressive delivery with feature flags as
applicable.
**Infrastructure as Code, Datacenter \& Cloud:**
* Plan, deploy, and maintain physical servers and datacenter assets (capacity, ordering, lifecycle,
firmware).
* Provision cloud resources (Azure, AWS, GCP) using Terraform with least‑privilege identities and
tagging/FinOps standards.
* Implement secure networking (VNet/VPC, private endpoints, peering, DNS/TLS, load balancing,
WAF).
**Observability \& SRE:**
* Own metrics, logs, traces, and profiling via Prometheus/Grafana, and ELK; leverage eBPF where
appropriate.
* Define SLIs/SLOs, manage error budgets, and lead incident response/post‑incident reviews alongside
the NOC.
**Security, Compliance \& Supply Chain:**
* Embed DevSecOps: secret rotation, workload identity federation (OIDC), and least privilege across
platforms.
* Establish software supply‑chain controls: SBOM (CycloneDX), image signing (Sigstore cosign),
provenance (SLSA), and policy‑as‑code (OPA/Kyverno).
* Automate vulnerability management, patching, and CIS/NIST\-aligned hardening.
**AIOps \& ChatOps:**
* Integrate AIOps for anomaly detection, noise reduction, and incident summarization; apply LLMs to
enhance runbooks and root‑cause hypotheses.
* Implement ChatOps for deployments, rol
We are a Hudson Oaks, Texas\-based Internet Service Provider (ISP) delivering High Speed Internet and Voice
Services throughout multiple states to residential, business, K\-12 Education and government customers. We
believe there is much more to an internet company than just delivering cost\-effective internet solutions; we
believe in delivering an overall customer experience that our competitors simply cannot match.
**Job Summary:**
Design, build, and operate Nextlink’s CI/CD, GitOps, container, and infrastructure\-as\-code platforms across
on‑prem datacenters and public cloud. Partner with Engineering, Field, NOC, and Security to automate
workflows, improve reliability, and accelerate delivery for customer\-impacting services. This role also serves
as a subject matter expert for network and infrastructure devices\-owning monitoring device models, standards,
and pre‑production testing aligned to the Nextlink launchpad process. Current stack includes GitHub/GitLab
CI, Terraform/Ansible, Docker/Kubernetes, and Grafana/Prometheus/ELK.
**Responsibilities:**
Reasonable accommodations will be made to enable individuals with disabilities to perform the essential
functions.
**Monitoring Device Models \& Telemetry:**
* Develop, test, and maintain up‑to‑date device models for Nextlink monitoring systems (SNMP, API,
streaming telemetry).
* Collaborate with Engineering, Field, and NOC to ensure correct monitoring, data collection, thresholds,
and alert standards.
**Network Monitoring Systems (NMS) – Zabbix (Design \& Architecture):**
* Own platform architecture for Zabbix (server, proxies, and backing database\-e.g., PostgreSQL with
time\-series extension) including HA/failover, housekeeping, and retention policies.
* Create and maintain device templates (SNMPv3, API, JMX/IPMI/SSH as applicable) with low\-level
discovery (LLD), item/trigger prototypes, macros, preprocessing, and escalation logic that match
Nextlink standards.
* Distributed monitoring at scale: design proxy placement and discovery to cover POPs/datacenters and
edge sites; ensure secure comms (TLS, PSKs/certs) and reliable buffering.
* Alert quality \& noise reduction: implement trigger dependencies, event correlation, maintenance
windows, and SLA/service maps; tune thresholds from SLOs and NOC feedback.
* Automation \& “Zabbix\-as\-Code”: manage templates, host onboarding, actions, and maintenance via the
Zabbix API and Git\-based workflows; integrate with CI/CD to promote monitoring changes through
environments.
* Integrations: connect Zabbix to ChatOps (Teams/Slack), ticketing, and paging; publish dashboards for
NOC/leadership; export metrics/events to your observability stack where useful.
* Security: enforce RBAC, SNMPv3, secret rotation, and least\-privilege API tokens; document and test
upgrades and rollbacks for zero/minimal downtime.
**Automation \& Network Change:**
* Identify, develop, and maintain scripts/tools to automate processes and network/device changes (Python,
Bash, PowerShell).
* Enforce configuration baselines, drift detection, and golden‑config rollouts; integrate change control and
approvals.
**CI/CD, GitOps \& Release Engineering:**
* Build and maintain CI/CD pipelines (reusable templates, quality gates, artifact/versioning, blue/green \&
canary).
* Implement GitOps for Kubernetes and network automation (Argo CD/Flux) using Helm/Kustomize and
policy controls.
* Support ephemeral environments, infrastructure testing, and progressive delivery with feature flags as
applicable.
**Infrastructure as Code, Datacenter \& Cloud:**
* Plan, deploy, and maintain physical servers and datacenter assets (capacity, ordering, lifecycle,
firmware).
* Provision cloud resources (Azure, AWS, GCP) using Terraform with least‑privilege identities and
tagging/FinOps standards.
* Implement secure networking (VNet/VPC, private endpoints, peering, DNS/TLS, load balancing,
WAF).
**Observability \& SRE:**
* Own metrics, logs, traces, and profiling via Prometheus/Grafana, and ELK; leverage eBPF where
appropriate.
* Define SLIs/SLOs, manage error budgets, and lead incident response/post‑incident reviews alongside
the NOC.
**Security, Compliance \& Supply Chain:**
* Embed DevSecOps: secret rotation, workload identity federation (OIDC), and least privilege across
platforms.
* Establish software supply‑chain controls: SBOM (CycloneDX), image signing (Sigstore cosign),
provenance (SLSA), and policy‑as‑code (OPA/Kyverno).
* Automate vulnerability management, patching, and CIS/NIST\-aligned hardening.
**AIOps \& ChatOps:**
* Integrate AIOps for anomaly detection, noise reduction, and incident summarization; apply LLMs to
enhance runbooks and root‑cause hypotheses.
* Implement ChatOps for deployments, rol
Posted: 2026-03-31