Resume
Jack Neely
jjneely at gmail dot com
https://linuxczar.net
https://github.com/jjneely
Summary
Founder and Fractional CTO with 25 years of experience building planet scale, multi-cloud observability platforms and distributed systems. Subject matter expert in Site Reliability Engineering, Prometheus, Grafana, OpenTelemetry, and AWS infrastructure. Proven leader architecting production systems from pre-seed startups to Fortune 500 enterprises. Strong hands-on coding skills combined with strategic technical leadership, leveraging AI-assisted development tools to accelerate implementations. Open source contributor and conference presenter specializing in SLO alerting, metrics architecture, and data-driven decision making. Experience building and mentoring globally distributed engineering teams.
Goals
Seeking consulting and fractional engagements as a Principal or Distinguished Architect specializing in observability, SRE, and platform engineering. Focus on strategic technical leadership including platform architecture, observability strategy, SLO design, and team mentorship. Comfortable with hands-on implementation when needed, but strongest impact comes from guiding technical direction, solving complex distributed systems challenges, and building frameworks that enable engineering teams to deliver reliable software at scale. Experience spans startups through Fortune 500 enterprises. Remote from Raleigh, NC.
Professional Experience
Owner
Cardinality Cloud, LLC, Raleigh, NC | February 2025 - Present
Fractional CTO for KindHabitLabs, Inc., Raleigh, North Carolina. Leading technical architecture for a pre-seed healthcare SaaS startup building Roo Mi, a HIPAA-compliant behavioral health platform for treatment facilities. February 2025 - Present.
- Design and implement production AWS infrastructure utilizing ECS, RDS PostgreSQL, CloudFront, VPC, and IAM with focus on cost optimization and long-term reliability.
- Lead migration from prototype to production-grade architecture enabling development team to build and scale healthcare software.
- Establish GitHub workflows and CI/CD pipelines using GitHub Actions for automated testing and deployments.
- Mentor development team on cloud architecture decisions, security practices, and software engineering workflows.
SaaS and Open Source product development for Cardinality Cloud. February 2025 - Present.
- Develop and maintain prometheus-alert-generator, an open source React-based tool for creating and maintaining SLO alerting rules in Prometheus. Reached 100+ users in first week of launch. Leverage AI-assisted coding tools to accelerate feature development and improve code quality.
- Active development and maintenance of open source observability tooling utilizing AI-assisted development practices to rapidly prototype and iterate on new capabilities.
Sr. Principal DevOps Observability Architect
Palo Alto Networks, Raleigh, NC | December 2020 - September 2025
Technical lead for Observability in the Prisma Cloud (public cloud security) division.
- Technical lead of the global observability team.
- Developed and maintained PostgreSQL schema for observability and big data use cases including partitioned tables, materialized views, and stored procedures.
- Built road maps, plans, and timelines to project manage the roll out, implementation, and upgrades of metrics, traces, logs/events, real user monitoring, OpenTelemetry, and synthetic monitoring through an enterprise culture.
- Implemented a unified and global Observability Platform for Prisma Cloud covering metrics, traces, logging, and alerting based on Grafana Mimir, Grafana Tempo, and a combination of Grafana Loki and OpenSearch.
- Designed the Observability Platform to support 80+ Kubernetes clusters running in multiple cloud service providers in all regions including China and US Government restricted regions. Integrated serverless applications and vendor supplied telemetry.
- Created required isolation zones for supporting Observability in compliance based partitions and deployments such as GDPR, China, and FedRAMP High.
- Created a migration plan away from a SignalFx metric solution with custom agents to a Cloud Native approach with Prometheus, Thanos, Spring Framework, and Micrometer.
- Initiated an Alert Review process to address alert fatigue and discover ongoing and unnoticed customer facing issues.
- Project management of migration away from Splunk toward a globally designed OpenSearch and Grafana Loki based logging and eventing solution saving $2.5 million per year.
- Defined a schema and plan for applications to adopt Structured Logs. Supported and configured structured log usage in Spring Boot, Logback, and similar Java tools.
- Managed vendor relationships.
- Rolled out PagerDuty and migrated PagerDuty account to corporate level account for better costing opportunities. Reported on every On Call rotation each week.
- Worked with global teams to design and solve high cardinality business intelligence use cases with streaming data pipelines utilizing AWS Kinesis and Apache Flink. Created interactive Grafana Dashboards to display and alert on results.
- Took ownership of OpenSearch clusters and improved throughput and stability into the 50+ TiB/day scale.
- Developed an automated score card tool with Grafana that tracks each functional area, team, and service. Reported on observability adoption and Service Level Objectives at each level.
- Built a centralized Thanos cluster supporting 150M+ metrics and created a plan to migrate from GCE VM instances running Prometheus to a centralized monitoring platform.
- Spearheaded Observability requirements and managed team completion of those requirements for multiple, global product releases. Including product wide migration to OpenTelemetry.
Senior Operations Engineer
42 Lines, Inc., Raleigh, NC | December 2013 - December 2020
Systems Architect, 42 Lines. Support a young SaaS product and scale operations and reliability with modern load balancing and telemetry. February 2020 - December 2020.
- Constructed a scalable load balancing solution using AWS Network Load Balancers, Auto Scaling Groups, and HAProxy that is able to manage stateful user sessions for existing custom software and newly developed SaaS products.
- Built relationships with partner companies to create a referral network working with the marketing team.
- Created an array of presentations and webinars showcasing Service Reliability Engineering best practices and how to use observability tools to make better business decisions.
- Gathered data for capacity planning and costs per customer by extracting meaningful value from log events in Elastic Stack and deploying a Prometheus based metrics solution.
- Introduced Prometheus and Grafana and built dashboards around the Four Golden Signals to visualize SaaS product behaviors.
Consulting for Mutations Limited, Los Angeles, California. Build and maintain a cloud agnostic Kubernetes ecosystem as a production environment for an IoT startup. November 2019 - February 2020.
- Integrated visibility services from DataDog into Amazon Elastic Kubernetes. Including management of all log sources and metrics from many different sources and protocols.
- Deployed and maintained mission critical services in Kubernetes via Helm Charts including Confluent Kafka endpoints and streaming data manipulation with Confluent ksqlDB.
- Managed 3 Kubernetes clusters for multiple development, staging, and production environments.
Consulting for Fitbit, Inc., San Francisco, California. Designed and implemented a Prometheus and Thanos observability platform ingesting 8 million data points per second. October 2014 - November 2019.
- Implemented Thanos as a solution for Prometheus clustering and long term storage of data in GCS. Worked with upstream developers in Go to fix and merge TSDB block repair routines, bug fixes for pointer math, and several command line options to help build a migration path from a large Prometheus environment.
- Worked with client’s teams all over the globe in an effort to sunset all StatsD and Graphite metric instrumentation in favor of Prometheus. Taught the software engineering teams the Prometheus APIs and libraries for Python, Go, and Java as required.
- Shifted client’s entire Prometheus monitoring stack from bare metal hardware in IBM SoftLayer into the Google Compute Platform. Containerized all components.
- Designed, planned, and implemented a distributed Prometheus based monitoring and telemetry infrastructure for a service oriented architecture supporting more than a hundred teams and more than 8 million samples per second.
- Patched Prometheus’s Histogram routines in Go to ensure buckets always increase monotonically when estimating quantiles to handle scrape consistency issues.
- Designed, managed, and upgraded a client’s Graphite and Grafana cluster supporting more than 30 million incoming metrics per minute, 300 terabytes of storage, and over 130 million unique time series.
- Contributor to the Open Source Graphite project with merge access. Multiple Python based patches written to increase the efficiency of Graphite and improve failure modes.
- Coded tools in Go to manage large Graphite clusters including rebalancing metrics and merging duplicate metrics. Code was an order of magnitude faster than similar Python tools.
- Architected a solution to ingest more than 2.5 million Statsd metrics per second and feed aggregate metrics to Graphite.
- Coded StatsRelay as an Open Source StatsD load balancer using Google’s Jump consistent hashing algorithm in Go. A single instance could handle 700,000 UDP packets per second.
- Replaced Etsy StatsD NodeJS daemon with Statsite written in C. Patched Statsite to add configuration options and to set socket options required for higher throughput. Additional patches to fix bugs in the event driven architecture.
Consulting for the Academy of Art University, San Francisco, California. December 2014 - October 2014.
- Built Nagios/Merlin monitoring systems to achieve single pane of glass monitoring for infrastructures spanning the globe and multiple cloud providers. Multiple bug fixes in C submitted and accepted by the OP5 team developing Merlin.
- Migrated an in-house Amazon EC2 provisioning system away from Chef to an Ansible based system.
Operations and Systems Specialist
NC State University Office of Information Technology, Raleigh, NC | April 2006 - November 2013
- Architect of NC State University’s Linux deployment. Continued project lead of NCSU Realm Linux. Support of over 2,000 workstations and servers and more than 100,000 users.
- Technical lead for the deployment of Red Hat Enterprise Linux throughout the University of North Carolina System’s 16 universities.
- Designed automated tools to better support Realm Linux including hands-off installs using PXE and Red Hat Kickstart.
- Designed and built a configuration management solution using Bcfg2 that was used by system administrators for a campus of 100,000 users. Also effective in situations where IT groups required being partitioned away from other IT groups. Planned and implemented a migration of this system to Puppet.
- Coded and implemented a dynamic Kickstart system for all of campus using Python, Genshi Templates, and XMLRPC.
- Created and maintained many RPM packages including OpenAFS packages.
- Built an RPM package build system using Subversion and Mock.
- Contributed to a number of Open Source projects such as MoinMoin, Yum, Anaconda, Bcfg2, Up2date, and others.
- Wrote production quality PAM modules in C to implement LDAP based authorization.
- Upgraded and took primary responsibility of the campus Kerberos authentication system. Moved the campus off of the Kerberos 4 protocol.
- Upgraded and took primary responsibility for the NC State University’s public NTP service.
- Implemented an inexpensive load balancing and high availability solution using LVS, Keepalived, and spanned network VLANs through the data centers. This system load balanced NC State’s main website, LDAP infrastructure, RHN Satellites, Webmail, Linux installs, and many other services.
- Upgraded the Cyrus IMAP implementation that supported over 100,000 users to new hardware, latest Realm Linux version, and the most current Cyrus IMAP software.
- Built Xen and KVM based virtual machines for optimal use of physical hardware.
- Train users, help desk staff, and other system administrators on a regular basis including topics such as configuration management with Puppet, RPM packaging, RAID and LVM usage, and deploying Realm Linux.
- Wrote and continue to update documentation and best practices guidelines for various topics in Linux administration.
- Started and organized NC State University’s FOSS Fair, an annual unconference style event for topics in Free and Open Source Software. Beginning in 2009.
Systems Programmer I
NC State University College of Physical and Mathematical Sciences, Raleigh, NC | 2001 - 2006
- Took on the responsibility as the project lead for NCSU Realm Linux.
- Managed the campus wide install base of Realm Linux at over 1,000 machines.
- Deployed and managed a Red Hat Network Satellite server and its supporting Oracle 9iR2 database.
- Served as contact point for campus regarding Linux security issues, bugs, enhancements, and features for Red Hat style Linux distributions.
- Designed, deployed, and administrated 100+ node Beowulf Cluster based on RHEL, Sun Grid Engine, and MPI.
- Built supporting infrastructure for the Beowulf including fiber channel storage arrays, deployment of Brocade FC switches, and Cisco 3750 network switches.
- Supported the Beowulf users as they created live hurricane prediction models submitted to the National Weather Service.
- Participated in the server room design process for the room that housed several Beowulf Clusters and other servers.
Systems Administrator
NC State University Department of Physics, Raleigh, NC | 2000 - 2001
- Worked with faculty and graduate students to troubleshoot problems and identify solutions.
- Tested and deployed Realm Linux and other Linuxes.
- Gained experience with Solaris, AIX, IRIX, and ULTRIX.
Research
NC State University Department of Chemistry, Raleigh, NC | 1999 - 2000
- Created professional quality video with Linux and developed a process to master recordings onto LaserDisc and produce VHS tapes from the master.
- Wrote C code to generate models of a molecular ``bridge’’ using graph algorithms. This code was used to design molecules that identify and bond to cancer cells and leave normal cells untouched.
Instructor
Sandhills Community College, Pinehurst, NC | Summer of 1999
Conference Presentations
Monitorama PDX 2023
Portland, Oregon | June 2023
Observability Data Engineering: A Story About Math, Four Golden Signals, and Business Intelligence. A brief tour of the Google SRE Four Golden Signals and the mathematical concepts that inform them, including enterprise use cases of tracking individual customers in observability data. Includes advanced techniques for aggregating and rolling up percentiles.
- Slides and Recording: https://linuxczar.net/monitorama-pdx-2023
All Things Open 2020
Raleigh, North Carolina | October 2020
Finding the Golden Signals with Prometheus. Scale a business by using Prometheus. A walk through of identifying how to instrument applications, use that data to create Service Level Objectives, and create Burn Rate style alerting while avoiding common challenges with large data sets in Prometheus.
- Slides and Recording: https://linuxczar.net/ato-2020
Monitorama PDX 2019 Lightning Talks
Portland, Oregon | June 2019
5 Neat Prometheus Tricks: PromQL and the Power of 1. A 5 minute lightning talk covering Prometheus’s query language and 5 useful ways to build advanced alerting and Service Level Objective hacks.
- Slides and Recording: https://linuxczar.net/monitorama-pdx-2019
Education
B.S. in Computer Science
North Carolina State University, Raleigh, NC
May, 2002
Activities and Honors
Practical Operations Podcast – operations.fm
- Co-host of the Practical Operations Podcast with more than 100 episodes recorded featuring best practices in the Operations, DevOps, SRE, Observability, and infrastructure fields.
Professional Awards and Associations
- Gertrude Cox Award for Innovative Excellence in Teaching and Learning with Technology winner for the Realm Linux project.
- Triangle Linux Users’ Group (TriLUG) member.
- Former President of the NC State University Linux Users’ Group.
Musician
- Tenor section leader for the North Carolina Master Chorale.
- Vocalist for St. Michael’s Episcopal Church, the Raleigh Convocation Choir of the Episcopal Diocese of North Carolina, and Schola Cantorum of the Episcopal Diocese of East Carolina
- Board member of the Raleigh Convocation Choir.
- Frequent choral performer with the North Carolina Symphony.
- Leadership team for the Carolina Summer Choral Residency of the Royal School of Church Music in America.