As part of the Global Infrastructure Operation, the senior infrastructure engineer – tools will be responsible for installation, configuration, administration documentation and support of all the tools used in the Integrated Operation Center (IOC). Availability of the monitoring infrastructure will also be an important focus of this role. Useful reports will be generate and share with the various operation tower, business and also the application team.
The Senior Infrastructure Engineer – Tools will actively have a role in the following phases:
Collaborate with the Technology Architecture & Tools Deployment team about new requirements, improvements or issues that may require a re-architect of the monitoring platforms or selection of a new vendor technology through the procurement process (RFI/RFP) and Proof of Concept (POC).
Work with applications teams and members of other operation towers to analyze capacity and performance requirements needed to setup monitoring parameters.
Work with vendors to create designs and configurations and translate these into BOMs (Bill of Materials) that satisfy requirements and meet budget objectives. Designs must be aligned with the architectural directions, must follow industry and market best practices and must take into consideration supportability requirements defined by the Global Infrastructure team.
Work together with the Tools team in any implementation, testing and rollout into production.
Provide SME leadership resolution of issues discovered during implementation while also considering discoveries impact on overall design.
Provide support to resolution of major issues and outages that require the involvement of a Subject Matter Expert (SME) in monitoring technologies.
I. ESSENTIAL FUNCTIONS:
Maintain the corporate monitoring infrastructure and ensure all monitoring systems are in compliant with global infrastructure standard and security policies.
Develop and document processes and procedures for maintaining the monitoring infrastructure at both the system and application level.
Handle the daily administration of all the monitoring tools that is related to monitoring, troubleshooting and performance.
Perform hands on tools setup, configuration and fine tuning.
Review and introduce new tools that will bring value & benefit to the business.
Seek opportunities to introduce automation in the monitoring process.
Conduct Event and Impact analysis together with the tools team to identify opportunities for improvement.
Generation of useful reports for business and also for the various infrastructure towers and application team.
II. JOB QUALIFICATIONS:
A Bachelor’s Degree in Computer Science, Engineering, Science and Math or related discipline with an IT emphasis is required.
Minimum 5-7 years of continuous experience in the following:
Design, development, implementation, and maintenance of large-scale systems, preferably across multiple hardware and software platforms
Good communications skills including strong verbal, written, and presentation skills.
Working knowledge of IT Service Management framework and/or ITIL concepts and practices. ITIL certification a plus.
Knowledge of BMC tools. This includes;
BMC ProactiveNet Performance Management (BPPM)
BMC Event & Impact Manager (BEIM)
BMC Atrium Discovery & Dependency Mapping (ADDM)
Transaction Manager Application Response Time (TMART)
Extensive experience in UNIX (Red Hat Linux, Suse Linux) and Windows based operating systems (Windows 2003, Windows 2008, Windows XP, Vista and Windows7)
Deep understanding of CPU, memory, disk IO performance management
Broad knowledge of computing technologies, blades and rack mounted Intel x86 and AMD based commodity servers, Enterprise class platforms – (Cisco UCS C- and B- series, IBM Power7, HP racks mountable and blade servers, Dell rack mountable and blade servers).
Experience in designing complex computing infrastructure solutions in a mid-to-large scale data center environment.
Experience with tools, components, methodologies and protocols related to latest computing platforms particularly blades technologies (Cisco UCS Manager, Stateless configurations and Service Profiles, Unified Fabric concept, Converged Network Adapters & Fiber Channel over Ethernet)
Knowledge of both Server and Desktop Virtualization technologies (VMware, Citrix, IBM PowerVM, Microsoft).
Good understanding of key virtualization concepts and how these are impacted by the computing platform.
Extensive experience in performance tuning and optimizing computing platforms.
Deep understanding and experience in build large computing solutions that scale both vertically or horizontally.
Broad knowledge and understanding of :
Storage Technologies, Backup solutions and methodologies (EMC, HP, IBM, Hitachi), Storage area network solutions (Cisco, Brocade)
Network technologies (Cisco Nexus solution, 7k, 5K, 1K),
Systems and Infrastructure Management technologies (tools from HP, BMC)
Familiarity with application software (SAP, IPlanet, WebSphere, Tibco, Apache, Veritas, MQSeries, JBoss) and databases (Oracle, DB2 and Microsoft SQL)
Knowledge of Active Directory and LDAP technologies.
Strong understanding of high availability concepts with practical experience and implementation of key components (databases, application servers, fault tolerant systems, cluster server/software, etc.)
Working Knowledge of TCP/IP, FTP, SMTP, SendMail, NFS, DNS, BIND and other Network services.
Sensitivity to Security best practices as related to infrastructure and application architectures.
BMC Tools, especially BPPM, Patrol (PCO/PCM) and BEM