Position Description/Responsibilities (CCC/Vendor):
Our client is looking for a Linux Systems Administrator for a permanent position. Research Computing and Cyberinfrastructure (RCC), a unit of Information Technology Services at our client, is seeking a talented and goal-oriented person to join our team as an HPC System Administrator in support of providing high-performance research computing and visualization systems and services at our client. RCC provides comprehensive system services and applications consulting for computational research and teaching endeavors of our client's faculty and their students. Our systems professionals have ongoing opportunity to work with leading-edge technologies and to expand their technical expertise, while also working within the stimulating intellectual environment of supporting the diverse research pursuits at our client, one of the nation's leading research institutions.
As a part of our HPC system administration team, you will work with other system administrators, programmers, application support experts, and faculty partners in the design, implementation, ongoing operations and user support for high-end computing systems.
Core responsibilities will include system administration, software maintenance, system monitoring and troubleshooting for HPC clusters and related infrastructure. You will also provide user support, documentation and consulting to help ensure the effective use of HPC resources by faculty members and their students. You will have opportunity within the group to work with a range of systems and technologies such as compute clusters, parallel file systems, high-speed interconnects, GPU-based computing, and database servers. You will be encouraged to stay abreast of market trends and to participate in group evaluation and adoption of early-stage technologies before they come to market.
In the past year the HPC system administration team has deployed a 2112 core Linux cluster with QDR Infiniband, a NVIDIA GPU cluster with 24 M2090 GPUs, a 288 core interactive use Linux cluster, a large memory Linux cluster with servers having up to 1 TB of RAM, a 2.5 PB tape library, an 8 Gbps Fiber Channel SAN, a 10 Gbps Ethernet core capable of Data Center Bridging, a data center area with 32 new water cooled computer racks, and much more. In the past year we have also partnered with vendors to evaluate demo hardware for 10 Gbps iWARP Ethernet, 40 Gbps RCOE Ethernet, 16 Gbps Fiber Channel, and flash memory arrays capable of one million IOPS. While exploring and evaluating these leading edge technologies, we have also helped support hundreds of researchers using our systems. We are currently preparing for several new clusters, over 750 TB of additional storage, a 16 Gbps Fiber Channel SAN, and other projects that will be deployed over the next few months. Our work is dynamic and exciting and offers team members the opportunity for personal growth and the ability to take on new responsibilities and learn new things.
Position Requirements/Technical Skills (CCC/Vendor):
Additional Desirable Skills:
- Linux systems administration: We run over 900 Linux systems and have over 3000 active user accounts - you will need to know your way around Linux very well to help develop and support this environment.
- Working knowledge of computer architecture, storage, and networking concepts: We investigate and evaluate many technologies when designing new systems. Understanding how computers work internally and as part of larger networks is quite important for this position.
- Scripting in languages such as Perl, Python, awk, and bash/csh/ksh: Scripts are a key component to automating tasks across our infrastructure - you will need to be able to understand existing scripts and write new scripts to work efficiently.
- Programming in a language such as C, C++, FORTRAN, or Java: It is not uncommon that we need to dig into the internals of some program to fix a bug, extend a feature, or understand what the program is trying to do.
- Software installation and maintenance: We work directly with the research community and often install and maintain both commercial and open source software for them on the systems we run.a
- The ability to self-learn: We are constantly learning and doing new things. To be a productive member of the team, you will need to have the curiosity and ability to jump into a new technology and figure it out.
The following skills are not requirements of a successful applicant, but candidates with these skills are strongly encouraged to apply.
- Experience with developing and maintaining networks that service users in different administrative domains.
- Parallel programming experience in MPI, OpenMP, Pthreads, CUDA, or other parallel languages.
- Experience with deploying and maintaining a distributed Windows computing infrastructure.
- Experience with parallel file systems such as GPFS or Lustre.
CAI is an EOE.
- Strong oral and written English communication skills.
- Strong interpersonal skills and the ability to work well in a team environment.
Company Overview (CCC Only):
For immediate consideration and interviews, please forward your resume and/or contact me directly at 717-651-3235 or email@example.com
Computer Aid - 19 months ago