A dedicated problem solver committed to completing projects both in the short and long term involving all flavors of Linux whether in the cloud or within reach. I've been engineering solutions, racking and stacking for a decade. I'm always hungry to learn something new and figure something out.
Site Reliability Engineer, Oracle
January 2022 - January 2023
Assisted in growing an SRE team from the ground up for a brand new product centered around a law enforcement recording, evidence and administration ecosystem running in Oracle Cloud Infrastructure with an emphasis on Oracle Linux.
- From zero dashboards and analytics to many: Brought together Prometheus, Grafana (via OCI Metrics plugin), OCI Monitoring and PromQL/MQL to build feature-rich dashboards for leadership, support and engineers to troubleshoot and diagnose issues. This included tuning existing exporters to creating textfile collectors in Prometheus to gather custom metrics.
- While working with leadership, helped define and track product KPIs, as well as define, visualize and track SLIs and SLOs in Grafana and existing OCI Monitoring tools.
- Met with non-SRE teams and individuals to train their expertise in using Grafana and existing OCI Monitoring tools.
- Brought up and tuned alerting where there was none - particularly through the instantiation of Slack alerts via AlertManager.
- Pushed infrastructure changes through BitBucket and Terraform pipelines.
- Participated in collaborative tabletop exercises helping identify and sharpen our roles and find multiple points of improvement in our procedures.
- Administration and engineering of Kubernetes clusters across multiple geographic regions and security clearance levels.
- Identified and reduced overhead and SRE toil reclaiming ~2 team hours a week through creation and expansion of oracle-python-sdk and Python/bash shell scripts. This ranged from automating tenancy administration via Oracle CLI (oci-cli) to meeting with SRE members to divide and conquer any concerning repetitive tasks we could better automate.
- Led and underwent multiple security audits and blameless post mortems, helping identify and mitigate product and Oracle Linux-specific vulnerabilities.
- Through Confluence, thorough documentation of new and existing procedures.
- Participated in and provided support for SRE on-call rotation, working collaboratively with team members to ensure timely response and resolution of critical incidents.
Platform Engineer, CoreDial
June 2019 - January 2022
Part of a lean Linux engineering team responsible for much of CoreDial's VoIP telephony platform and infrastructure with exemplary use of FOSS technologies.
- Part of a team helping maintain a highly available VoIP and hosted PBX platform across open-source technologies such as FreeSWITCH and Asterisk.
- Administration and maintenance of ~700 hosts across the cloud and in several data center locations spanning Red Hat, CentOS, and Ubuntu.
- Reducing toil through Ansible, helping us configure and maintain patching, script and cron rollouts, installations and whole builds.
- Projects range from building database and application servers, environment migrations, P2V migrations, mass monitoring deployments, and much more.
- Helping maintain a 24/7 monitoring and alerting stack with Prometheus and Grafana, and customizing those deployments with Prometheus' textfile collector and PromQL.
- Building useful and robust dashboards in Grafana to deliver instant, simple and powerful metrics.
- OS and kernel tuning and performance improvements.
- Working on everything from KVM host maintenance, to racking and stacking Dell and HP hardware to EC2 environment builds.
- Day-to-day tasks range from managing our Bacula backup solution, adminstering users and DNS via FreeIPA, and typical ticket issues while being able to delegate and self-start accordingly.
- Create and drive multiple off-hours maintenances presented and peer-approved and thoroughly documented.
- Rotating in and out of a 24/7 on-call schedule.
Linux Systems Engineer, HighPoint Solutions (Acquired by IQVIA)
April 2017 - June 2019
Heavily responsible for AWS environments of multiple biopharmaceutical and commercial clients. This role leaned heavily on the healthcare space. I was also involved in maintaining physical hosts in a colocated data center, as well as migrating them to AWS.
- Primary engineer for multiple clients, responsible for assistance and execution of tasks such as maintenance windows, upgrades, and reporting.
- Extensive engineering of AWS for clients: standing up EC2 machines, managing their sites with ELB, Route 53, and Elastic IPs.
- Frequent use of S3 and S3 bucket policies to structure and enforce data access to larger clients.
- Use of Pure Storage and NetApp for volume management.
- Adhered to change control requirements for validated environments of larger clients.
Linux Systems Engineer, Contour Data Solutions
June 2016 - September 2016
VMWare administration and engineering across 150 Windows and Linux servers.
- Installed and configured SolarWinds and InfoSight monitoring for new and existing clients.
- Visited client sites to determine best IT related solutions.
Linux Systems Engineer, CardConnect
February 2016 - April 2016
Deployed a fresh RHEL Satellite environment (server and capsules) for content across King of Prussia, Philadelphia, and St. Louis for a mid-sized payment processing platform.
- Engineering Nagios for alerting.
- Participated in weekly security audits. CardConnect as a firm worked extensively in payment processing, as such there was a lot of scrutiny on transactions that needed to be examined thoroughly.
Staff Systems Engineer, Health Market Science (Acquired by LexisNexis)
April 2011 - November 2015
Administered and engineered an environment of over 1000 virtual machines running RHEL 4,5,6 atop ESXi 5.5, including a physical environment of 250 hosts running RHEL 4,5,6 for a mid-sized healthcare analytics company, helping grow the infrastructure which later led to an acqusition by LexisNexis.
- Configuration, maintenance and troubleshooting of RHEL, CentOS, and Windows virtual machines.
- JBoss and Tomcat administration and engineering, working with multiple development and QA teams to provide constant support.
- Engineering of Jenkins deployment service to push and maintain development, QA, and production code.
- Tidal Enterprise Scheduler support, deployment, and maintenance (RHEL).
- Active Directory, Windows 2008/2012 Server configuration, and Exchange administration.
- Maintained HIPAA compliance.