Texas A&M University — M365 Site Reliability Engineer

Our Commitment
Texas A&M University is committed to enriching the learning and working environment for all visitors, students, faculty, and staff by promoting a culture that embraces inclusion, diversity, equity, and accountability.  Diverse perspectives, talents, and identities are vital to accomplishing our mission and living our core values.

Who we are
The Division of Information Technology provides reliable and accessible IT services to elevate and enhance Texas A&M University. We provide IT leadership to the campus community while enabling the research, education and service mission of Texas A&M. With trusted services and innovative solutions, we are changing the technology landscape on campus.  To learn more about IT at Texas A&M University visit us at: https://it.tamu.edu/

What we want
The Senior IT Professional II (Site Reliability Engineer II), under general direction, provides technical leadership for multiple complex unit-level projects or operations utilizing multiple technical fields. Develops technical standards for implementing unit projects or operations. May coordinate the technical activities of a support or project teams. Serves as a chair or senior member of a technical information resource team responsible for planning and setting technical standards and direction. The selected candidate will subscribe to and support our commitment to Inclusion, Diversity, Equity and Accountability (IDEA) as stated above. If this job description sounds interesting to you, we invite you to apply to be considered for this opportunity.

What you need to know

Cover Letter/Resume’: A cover letter and resume’ are strongly recommended. You may upload these documents on the application under CV/Resume.

Position Details: Hiring restrictions in compliance with System Policy 15.02 Export Controls: Must be a United States citizen, permanent resident, or a person granted asylum or refugee status in accordance with 15 CFR, Part 762; 22 CFR §§122.5, 123.22 and 123.26; and 31 CFR § 501.601.

COVID-19 information: Texas A&M University monitors local, state and federally mandated health guidelines to keep students, employees, prospective employees, and visitors as safe as possible. For the latest information regarding Texas A&M’s COVID-19 response, please visit the University’s COVID-19 website. For COVID-19 employment-related information, please visit the Division of Human Resources and Organizational Effectiveness’ COVID-19 website.

Required Education and Experience:

  • Bachelor’s degree in applicable field or equivalent combination of education and experience
  • Eight years of experience in multiple technology areas such as system administration, DevOps, collaborative software development, customer support, application support, project management, database administration, system reporting, access management, system security, and/or disaster recovery

Required Knowledge, Skills, and Abilities:

  • Must be able to work in a collaborative team environment
  • Ability to multi-task and work cooperatively with a diverse range of people.
  • Must have strong interpersonal skills
  • Knowledge of word processing and spreadsheet applications
  • Excellent written communication, analytical, interpersonal, and organizational skills

Other Requirements or Other Factors:

  • Hiring restrictions in compliance with System Policy 15.02 Export Controls: Must be a United States citizen, permanent resident, or a person granted asylum or refugee status in accordance with 15 CFR, Part 762; 22 CFR §§122.5, 123.22 and 123.26; and 31 CFR § 501.601.

Preferred Education and Experience:

  • Degree in information technology or related field
  • Experienced deploying system architectures in public cloud environments (Google, Azure, and AWS)
  • Programming experience with at least one of the following languages: Node.js, Python, Ruby, Go, PowerShell, or Bash
  • Knowledge of and experience using databases, particularly MySQL or MS-SQL
  • Knowledge of and experience with data analysis
  • Knowledge of and experience writing REST APIs
  • Knowledge of and experience consuming cloud web services (Azure, Microsoft Graph, and Google APIs in particular)
  • Experience with at least one of the following automation technologies: Chef, Ansible, and/or Puppet
  • Experience, including actual pull requests, with Github or Gitlab
  • Knowledge of and experience with CI/CD methodologies
  • Knowledge of and experience with Microsoft, Linux, and Mac operating systems (Windows Server 2012, 2016, 2019, Windows 10, CentOS, Mac OS X)
  • Knowledge and experience with Microsoft Active Directory and OpenLDAP
  • General familiarity with network protocols and theory (TCP/IP, UDP, ICMP, MAC addresses, IP packets, DNS, OSI layers, and load balancing, etc.)
  • General familiarity with principles of project management and service management framework (e.g., ITIL/ITSM)
  • Knowledge of and experience with DevOps methodologies
  • Knowledge of and experience with Docker, containers, and related technologies

Preferred Licenses and Certifications:

  • ITIL Foundations, PMP

Preferred Knowledge, Skills, and Abilities:

  • Experience with Hashicorp Terraform, Packer, and Vault
  • Experience writing Infrastructure as Code (using Terraform or Pulumi)
  • Knowledge of and experience with Kubernetes on-premise and in one or more public clouds (AWS, GCP, Azure)
  • Advanced cross-disciplined IT skills, advanced analysis and troubleshooting/problem-solving, client relations skills, requirement assessment and analysis, project management methodology, understands context/interrelationships, and proficiency of ITIL
  • Experience with Objectives and Key Results methodologies is highly desirable


  • Resolution of Technical Problems – Provides technical oversight and training for conducting research into cloud compute problems and the formulation of recommended solutions for Cloud-centric customers. Serves as a Cloud Computing resource for internal and external customers. Assists in strategic planning for cloud computing and related communications.  Develops budget plans and cost projections for deployment into Cloud environments.
  • Develop Infrastructure with Code and Automation – Works with other technical staff to develop infrastructure-as-code to facilitate deployment of virtualized and ephemeral infrastructure services in public cloud and in hybrid on-premise environments.  Maintains, develops, and documents procedural scripts to maintain infrastructure services.  Develops code and systems to automate traditionally manual and repetitive processes.
  • Automated Administration and Execution of Coded Infrastructure – Build, Implement, and Configure public cloud and on-prem cloud-like resources and platforms with code.   Orchestrates the configuration and deployment of multi-part and highly interdependent systems using Infrastructure-as-Code technologies such as Ansible, Chef, Puppet, and/or Terraform as needed for the particular platform. Automates the provisioning of accounts, groups, settings, and configurations of cloud platforms and services such as Google Workspace, Microsoft 365, AWS, Google Cloud, and Azure, using modern APIs and Infrastructure as Code using technologies such as Ansible, Chef, Puppet, PowerShell, Bash/Zsh and/or Terraform. Use Github to implement CI/CD pipeline automation to facilitate collaborative and verifiable workflows to ensure accurate and repeatable administration of systems and platforms. Provides technical guidance and oversight for virtualized or ephemeral computing resources and Cloud platform administration. Conducts server and platform performance analyses and tuning with code. Coordinates routine audits of systems and software with automation. Oversees and coordinates the use of data analytics and predictive determination in system logs.   Implements security control code libraries to meet compliance requirements for cloud-based computer systems, platforms, and environments.  Develops disaster recovery plans for multi-part and highly interdependent systems.  Coordinates and monitors the problem management process to include automation backup support.
  • Cloud Project Planning – May coordinate the technical activities of a project team.  Completes reports and summaries for management and/or users including status reports, problem reports, progress summaries, and system utilization reports.
  • Security – Build and configures software-based systems to implement IT security controls to meet TAMU, TAMU System, State, and Federal compliance requirements.  Using automation and Infrastructure as Code such as Ansible, Chef, and Terraform, write code and automate the deployment of security configurations that facilitate the protection of infrastructure and applications. Using automation and Infrastructure as Code, write code to ensure that security settings and compliance are maintained and monitored to prevent drift from intended stance or degradation of protective posture.
  • Support – Troubleshoots multi-part and highly interdependent cloud-based networks, computing systems, and software applications. Provides Tier III support. Oversees the process used to document server support methods, procedures, and configuration. Coordinates the evaluation of new technologies.
  • Innovation and Collaboration with Colleagues and Customers – Serves as a senior member of an information resource team responsible for setting technical direction in cloud-based initiatives. Works with technically focused campus colleagues to collaborate on cloud architecture strategies, methods, and solutions to emerging and persistent problems. Participates in the request for discussion process as needed. Serves as a liaison to university groups or committees.
  • Professional Development – Participate in training and professional development sessions. Performs other duties as assigned.

Instructions to Applicants: Applications received by Texas A&M University must either have all job application data entered or a resume attached. Failure to provide all job application data or a complete resume could result in an invalid submission and a rejected application. We encourage all applicants to upload a resume or use a LinkedIn profile to pre-populate the online application. 

All positions are security-sensitive. Applicants are subject to a criminal history investigation, and employment is contingent upon the institution’s verification of credentials and/or other information required by the institution’s procedures, including the completion of the criminal history check.

Equal Opportunity/Affirmative Action/Veterans/Disability Employer committed to diversity.