Downloads
39.4KB
30.2KB
Objective
Our goal is to monitor Microsoft Lync Server 2013/Skype for Business in BMC PATROL and/or BMC TrueSight Operations Management to detect failures, errors and performance problems.
This configuration has been designed to monitor Microsoft Lync Server 2013/Skype for Business based on a standard environment. It may need to be customized to suit your specific needs. Sentry Software cannot be held responsible for any use which may be made of the provided file.
Solution
Monitoring Studio KM for PATROL
Our solution relies on Monitoring Studio KM for PATROL, which is a configurable module for BMC PATROL (and therefore BMC TrueSight Operations Management). Typically, a PATROL administrator will use Monitoring Studio KM's GUI to "create" the monitoring of anything which the administrator does not have a built-in KM for. The "setup" of the monitoring is stored in the PATROL Agent's configuration.
More information about Monitoring Studio KM is available here on Sentry’s Web site.
Sentry has created a configuration for Monitoring Studio KM to monitor Microsoft Lync Server 2013/Skype for Business. This article simply explains how to import this pre-built configuration and customize it to monitor your Microsoft Lync/Skype for Business environment.
Monitored components
The architecture of Microsoft Lync Server 2013/Skype for Business can be quite complex as it involves several systems with different roles, where each role runs different components of Lync/Skype for Business Server: Archive Servers, Edge Servers, Quality Monitoring Servers, Microsoft SQL Server, etc.
Note: The solution described here does not cover Microsoft SQL Server or other non-Lync components that may participate in the architecture, like NLB, MSCS, etc. The user is invited to configure other KMs specific to these technologies, notably:
- BMC Performance Management for Servers (for Microsoft Windows infrastructure, including Windows itself, MSCS, NLB, MSMQ, etc.)
- BMC Performance Management for Databases (for Microsoft SQL Server)
The solution monitors a mixture of Windows services, processes, Windows Event Logs and performance counters that are grouped based upon which component of Lync/Skype for Business Server they relate to.
1/Windows services
The status of the services listed in the table below is monitored by the solution. Each service is represented with a separate object in the console. The “Status” parameter reports the status of the service. This parameter will trigger an alarm if the service is stopped and a warning if the service is in an intermediate state (stop pending, paused, etc.).
Lync/Skype for Business Component | Monitored Windows Service |
LSApplicationCallAnnouncementService_RTCCAS | LyncServerConferencingAnnouncement(RTCCAS) |
LSApplicationCallParkService_RTCCPS | LyncServerCallPark(RTCCPS) |
LSApplicationConferencingAutoAttendant_RTCCAA | LyncServerConferencingAttendant(RTCCAA) |
LSApplicationPolicyDecisionPoint_RTCPDPAUTH | LyncServerBandwidthPolicyService(Authentication)(RTCPDPAUTH) |
LSApplicationPolicyDecisionPoint_RTCPDPCORE | LyncServerBandwidthPolicyService (RTCPDPCORE) |
LSApplicationResponseGroup_RTCRGS | LyncServerResponseGroup(RTCRGS) |
LSArchiving_RTCLOG | LyncServerArchiving(RTCLOG) |
LSCentralManagementFTA_FTA | LyncServerFileTransferAgent(FTA) |
LSCentralManagementMasterAgent_MASTER | LyncServerMasterReplicatorAgent(MASTER) |
LSConferencingApplicationSharing_RTCASMCU | LyncServerApplicationSharing(RTCASMCU) |
LSConferencingAudioProvider_RTCACPMCU | LyncOnlineTelephonyConferencing(RTCACPMCU) |
LSConferencingAudioVideo_RTCAVMCU | LyncServerAudio/VideoConferencing(RTCAVMCU) |
LSConferencingData_RTCDATAMCU | LyncServerWebConferencing(RTCDATAMCU) |
LSConferencingInstantMessage_RTCIMMCU | LyncServerIMConferencing(RTCIMMCU) |
LSConferencingWeb_RTCMEETINGMCU | LyncServerWebConferencingCompatibility(RTCMEETINGMCU) |
LSCoreReplicationAgent_REPLICA | LyncServerReplicaReplicatorAgent(REPLICA) |
LSEdgeAccess_RTCSRV | LyncServerFront-End(RTCSRV) |
LSEdgeAudioVideoAuthentication_RTCMRAUTH | LyncServerAudio/VideoAuthentication(RTCMRAUTH) |
LSEdgeAudioVideo_RTCMEDIARELAY | LyncServerAudio/VideoEdge(RTCMEDIARELAY) |
LSEdgeWebConferencing_RTCDATAPROXY | LyncServerWebConferencingEdge(RTCDATAPROXY) |
LSMediation_RTCMEDSRV | LyncServerMediation(RTCMEDSRV) |
LSMonitoringCallDetailsReporting_RTCCDR | LyncServerCallDetailRecording(RTCCDR) |
LSMonitoringQoE_RtcQms | LyncServerQoEMonitoringService(RtcQms) |
LSMonitoring_RTCATS | LyncServerAudioTestService(RTCATS) |
LSProvisioning_RtcProv | LyncOnlineProvisioningService(RtcProv) |
LSRegistration_RTCSRV | LyncServerFront-End(RTCSRV) |
LSUser_RTCSRV | LyncServerFront-End(RTCSRV) |
LSWeb_W3SVC | WorldWideWebPublishingService(w3svc) |
Figure 1 - Monitoring a Windows Service
2/Processes
Key performance metrics are constantly monitored for all the Lync/Skype for Business Server processes. Each process is monitored independently and displayed as a separate object in the console. The sum of all of these processes is available as a separate object as well so administrators can check the system resource consumption of Lync/Skype for Business Server in general and of each process individually.
Monitored processes:
- Process: ASMCUSvc.exe
- Process: AVMCUSvc.exe
- Process: DataMCUSvc.exe
- Process: FileTransferAgent.exe
- Process: IMMCUSvc.exe
- Process: MasterReplicatorAgent.exe
- Process: MeetingMCUSvc.exe
- Process: OcsAppServerHost.exe RTCATS
- Process: QoEAgent.exe
- Process: ReplicaReplicatorAgent.exe
- Process: RTCSrv.exe
- Process: w3wp.exe
- Total Lync Server Processes (all processes that have %{PATH} in their path)
For each of these processes, the following parameters are monitored:
- HandleCount
- PageFaultsPerSec
- PagefileBytes
- PrivateBytes
- ProcessorTime
- ThreadCount
- VirtualBytes
- WorkingSet
Figure 2 - Process Monitoring
By default, only the ProcessorTime parameter triggers a warning when it reaches 90% five times in a row and 99% twice in a row. These thresholds (like any alert threshold in Monitoring Studio) can be customized through the interface:
Figure 3- Setting Thresholds
3/Windows Event Logs
When Microsoft Lync/Skype for Business Server encounters a problem, it reports it as an event in the Windows Event Log dedicated to Microsoft Lync/Skype for Business. These events are constantly monitored by the solution and any "warning" or "error" event related to Lync in the Windows Event Log will be reported in the console.
Each Event Log Source is represented with a separate icon in the console. For each instance, the MatchingEventCount parameter reports the number of warning and error events since the last reset of the counter. For each new event, the MatchingEventCount parameter is increased by one. As the alarm threshold is set to 1 on this parameter, an alert is triggered in the PATROL Console as soon as something wrong happens with Microsoft Lync/Skype for Business. The MatchingEventCount keeps the same value until it is manually acknowledged and reset by an operator. This acknowledgment can be configured to happen automatically after a certain amount of time.
Figure 4- Monitoring Windows Event Logs
When a new event is detected, a PATROL event is generated with the exact content of the Windows event so that administrators can easily understand what is wrong and how to solve the problem (the content of the events for Microsoft Lync/Skype for Business are notably detailed and provide much information on how to diagnose and troubleshoot the problem). An annotation point is also added to the graph of the MatchingEventCount parameter.
Figure 5- Event Annotation
The Windows Event Log sources that are monitored for Microsoft Lync Server/Skype for Business are:
- LS A/V Edge Server
- LS ACP MCU
- LS Address Book and Distribution List Expansion Web Service
- LS Address Book Server
- LS AppDomain Host Process
- LS Application Error
- LS Application Server
- LS Applications Module
- LS ApplicationSharing Conferencing Server
- LS Archiving Agent
- LS Archiving Server
- LS Audio-Video Conferencing Server
- LS Audio/Video Authentication Server
- LS Auto Update Server
- LS Bandwidth Policy Service (Authentication)
- LS Bandwidth Policy Service (Core)
- LS Call Detail Recording
- LS Call Park Service
- LS Certificate Manager
- LS Client Version Filter
- LS Common Library
- LS Conferencing Announcement Service
- LS Conferencing Attendant
- LS Configuration Provider
- LS Data MCU
- LS DB Access Layer
- LS Dialin Web Service
- LS Exchange Unified Messaging Routing
- LS File Transfer Agent Service
- LS IM MCU
- LS Inbound Routing
- LS Incoming Federation Service
- LS Intelligent IM Filter
- LS InterCluster Routing
- LS Join Launcher Web Service
- LS LDM
- LS Location Information Service
- LS Lync Web App
- LS Master Replicator Agent Service
- LS MCU Factory
- LS MCU Infrastructure
- LS Mediation Server
- LS Meeting MCU
- LS MGC ADMIN TOOL
- LS MGC CLIENT
- LS MGC COMMON
- LS MGC COMPLIANCE
- LS MGC CONFIG
- LS MGC ENDPOINT
- LS MGC LOADER
- LS MGC LOOKUP
- LS MGC SERVER
- LS MGC SERVICE
- LS MGC TRANSPORT
- LS Outbound Routing
- LS Outgoing Federation Service
- LS Password Expiry Check
- LS Protocol Stack
- LS Provisioning Service
- LS QoE Monitoring Agent
- LS QoE Monitoring Service
- LS Remote PowerShell
- LS Replica Replicator Agent Service
- LS Response Group Service
- LS Routing Data Sync Agent
- LS Script-Only Applications
- LS Server
- LS Software Update Service
- LS Translation Service
- LS User Replicator
- LS User Services
- LS UserPin Service
- LS Web Components Server
- LS Web Conferencing Edge Server
4/Performance Counters
Many performance counters are monitored constantly to report the activity of Microsoft Lync Server/Skype for Business and detect potential performance bottlenecks. Each performance counter is represented as a separate instance in the PATROL Console, grouped by the class of the corresponding Windows performance object.
Figure 6- Performance Monitoring
Note: Some counters may be redundant and we recommend that you adapt them to your environment.
The list of monitored performance counters is summarized in the table below:
Lync Component | Performance Object Name | Counter Name |
LSApplication | LS:A/V Auth - Requests | - Bad Requests Received/sec |
LSApplication | LS:A/V Auth - Requests | - Credentials Issued/sec |
LSApplication | LS:A/V Auth - Requests | - Current requests serviced |
LSApplication | LS:CAA - Operations | CAA - Incomplete calls per sec |
LSApplication | LS:CAA - Planning | CAA - Current calls |
LSApplication | LS:CAA - Planning | CAA - Current calls on Music-On-Hold |
LSApplication | LS:CAA - Planning | CAA - Current calls waiting in the Lobby |
LSApplication | LS:CAA - Planning | CAA - Number of times retry logic was successful |
LSApplication | LS:CAA - Planning | CAA - Number of times retry logic was triggered |
LSApplication | LS:CAA - Planning | CAA - Total Application Endpoint creation failures |
LSApplication | LS:CAA - Planning | CAA - Total Application Endpoint termination failures |
LSApplication | LS:CAA - Planning | CAA - Total bandwidth failures |
LSApplication | LS:CAA - Planning | CAA - Total calls failed to transfer to the conference |
LSApplication | LS:CAA - Planning | CAA - Total calls user failed to enter conference id correctly three times |
LSApplication | LS:CAA - Planning | CAA - Total incomplete calls |
LSApplication | LS:CAS - Informational | CAS - Number of conferences joined |
LSApplication | LS:CAS - Informational | CAS - Total number of conferences joined |
LSApplication | LS:PDP Auth - Requests | - Bad Requests Received/sec |
LSApplication | LS:PDP Auth - Requests | - Credentials Issued/sec |
LSApplication | LS:PDP Auth - Requests | - Current requests serviced |
LSApplication | LS:RGS - Response Group Service Call Control | RGS - Total number of incoming calls declined because of high number of active calls |
LSApplication | LS:RGS - Response Group Service Call Control | RGS - Total number of incoming calls declined because of memory pressure |
LSApplication | LS:RGS - Response Group Service Hosting | RGS - Total number of incoming calls that were declined because of a Match Making failure |
LSApplication | LS:RGS - Response Group Service Match Making | RGS - Current number of calls |
LSApplication | LS:RGS - Response Group Service Workflow | RGS - Calls that failed due to critical server errors |
LSConferencing | LS:AVMCU - Informational | AVMCU - Total MRAS Failure Response Exceptions |
LSConferencing | LS:AVMCU - Informational | AVMCU - Total MRAS Generic Exceptions |
LSConferencing | LS:AVMCU - Informational | AVMCU - Total MRAS Real Time Exceptions |
LSConferencing | LS:AVMCU - Informational | AVMCU - Total MRAS Request |
LSConferencing | LS:AVMCU - Informational | AVMCU - Total MRAS Request error |
LSConferencing | LS:AVMCU - Informational | AVMCU - Total MRAS Requests Rejected |
LSConferencing | LS:AVMCU - Informational | AVMCU - Total MRAS Timeout Exceptions |
LSConferencing | LS:AVMCU - MCU Health And Performance | AVMCU - MCU Health State |
LSConferencing | LS:AVMCU - MCU Health And Performance | AVMCU - MCU Health State Changed Count |
LSConferencing | LS:AVMCU - Operations | AVMCU - Number of Conferences |
LSConferencing | LS:AVMCU - Operations | AVMCU - Number of Trusted Users |
LSConferencing | LS:AVMCU - Operations | AVMCU - Number of Users |
LSConferencing | LS:AsMcu - AsMcu Conferences | ASMCU - Active Ajax Viewers |
LSConferencing | LS:AsMcu - AsMcu Conferences | ASMCU - Active Conferences |
LSConferencing | LS:AsMcu - AsMcu Conferences | ASMCU - Active Data Channels |
LSConferencing | LS:AsMcu - AsMcu Conferences | ASMCU - Active Transcoders |
LSConferencing | LS:AsMcu - AsMcu Conferences | ASMCU - Connected Users |
LSConferencing | LS:AsMcu - AsMcu Conferences | ASMCU - Media Timeout Failures |
LSConferencing | LS:AsMcu - AsMcu Conferences | ASMCU - Packet Loss Failure |
LSConferencing | LS:AsMcu - AsMcu Conferences | ASMCU - Rdp Connection Timeout Failures |
LSConferencing | LS:AsMcu - AsMcu Conferences | ASMCU - Sip Dialog Failures |
LSConferencing | LS:AsMcu - MCU Health And Performance | ASMCU - MCU Health State |
LSConferencing | LS:AsMcu - MCU Health And Performance | ASMCU - MCU Health State Changed Count |
LSConferencing | LS:DATAMCU - DataMCU Conferences | DATAMCU - Active Conferences |
LSConferencing | LS:DATAMCU - DataMCU Conferences | DATAMCU - Average time queued in data Mcu for LDM messages |
LSConferencing | LS:DATAMCU - DataMCU Conferences | DATAMCU - Blocked files |
LSConferencing | LS:DATAMCU - DataMCU Conferences | DATAMCU - Blocked files/sec |
LSConferencing | LS:DATAMCU - DataMCU Conferences | DATAMCU - Conference workitems load |
LSConferencing | LS:DATAMCU - DataMCU Conferences | DATAMCU - Number of Unhandled Application Exception |
LSConferencing | LS:DATAMCU - DataMCU Conferences | DATAMCU - Session queues state |
LSConferencing | LS:DATAMCU - DataMCU Conferences | DATAMCU - Total data archiving events recorded. |
LSConferencing | LS:DATAMCU - MCU Health And Performance | DATAMCU - MCU Health State |
LSConferencing | LS:DATAMCU - MCU Health And Performance | DATAMCU - MCU Health State Changed Count |
LSConferencing | LS:ImMcu - IMMcu Conferences | IMMCU - Active Conferences |
LSConferencing | LS:ImMcu - IMMcu Conferences | IMMCU - Connected Users |
LSConferencing | LS:ImMcu - IMMcu Conferences | IMMCU - Throttled Sip Connections |
LSConferencing | LS:ImMcu - MCU Health And Performance | IMMCU - MCU Health State |
LSConferencing | LS:ImMcu - MCU Health And Performance | IMMCU - MCU Health State Changed Count |
LSConferencing | LS:MEDIA - Planning(*) | MEDIA - Number of occasions conference processing is delayed significantly |
LSConferencing | LS:SipEps - SipEps Connections(*) | SipEps - NumberOfDNSResolutionFailures |
LSConferencing | LS:SipEps - SipEps Connections(*) | SipEps - NumberOfDNSResolutionFailuresPerSecond |
LSEdge | LS:A/V Auth - Requests | - Bad Requests Received/sec |
LSEdge | LS:A/V Auth - Requests | - Credentials Issued/sec |
LSEdge | LS:A/V Auth - Requests | - Current requests serviced |
LSEdge | LS:PDP Auth - Requests | - Bad Requests Received/sec |
LSEdge | LS:PDP Auth - Requests | - Credentials Issued/sec |
LSEdge | LS:PDP Auth - Requests | - Current requests serviced |
LSEdge | LS:SIP - Access Edge Server Connections | SIP - Rejected External Edge Client Connections/sec |
LSEdge | LS:SIP - Access Edge Server Messages | SIP - External Messages/sec Dropped Due To Unresolved Domain |
LSEdge | LS:SIP - Access Edge Server Messages | SIP - Messages/sec Dropped Due To Unknown Domain |
LSEdge | LS:SIP - Load Management | SIP - Address space usage |
LSEdge | LS:SIP - Load Management | SIP - Average Holding Time For Incoming Messages |
LSEdge | LS:SIP - Load Management | SIP - Incoming Messages Timed out |
LSEdge | LS:SIP - Networking | SIP - Connections Refused Due To Server Overload |
LSEdge | LS:SIP - Networking | SIP - Connections Refused Due To Server Overload/Sec |
LSEdge | LS:SIP - Peers(*) | SIP - Average Outgoing Queue Delay |
LSEdge | LS:SIP - Peers(*) | SIP - Received Bytes |
LSEdge | LS:SIP - Peers(*) | SIP - Received Bytes/sec |
LSEdge | LS:SIP - Peers(*) | SIP - Sends Outstanding |
LSEdge | LS:SIP - Peers(*) | SIP - Sent Bytes |
LSEdge | LS:SIP - Peers(*) | SIP - Sent Bytes/sec |
LSEdge | LS:SIP - Protocol | SIP - Average Incoming Message Processing Time |
LSEdge | LS:SIP - Protocol | SIP - Messages In Server |
LSEdge | LS:SIP - Responses | SIP - Local 500 Responses/sec |
LSEdge | LS:SIP - Responses | SIP - Local 503 Responses/sec |
LSMediationPerf | LS:MediationServer - Global Counters | - Total failed calls caused by unexpected interaction from the Proxy |
LSMediationPerf | LS:MediationServer - Global Per Gateway Counters(*) | - Total failed calls caused by unexpected interaction from a gateway |
LSMediationPerf | LS:MediationServer - Health Indices | - Load Call Failure Index |
LSMediationPerf | LS:MediationServer - Inbound Calls(*) | - Current |
LSMediationPerf | LS:MediationServer - Inbound Calls(*) | - Total attempts |
LSMediationPerf | LS:MediationServer - Inbound Calls(*) | - Total established |
LSMediationPerf | LS:MediationServer - Inbound Calls(*) | - Total rejected due to load |
LSMediationPerf | LS:MediationServer - Media Relay | - Media Connectivity Check Failure |
LSMediationPerf | LS:MediationServer - Outbound Calls(*) | - Current |
LSMediationPerf | LS:MediationServer - Outbound Calls(*) | - Total attempts |
LSMediationPerf | LS:MediationServer - Outbound Calls(*) | - Total established |
LSMediationPerf | LS:MediationServer - Outbound Calls(*) | - Total rejected due to load |
LSRegistrationPerf | LS:Arch Agent - MSMQ | Arch Agent - Archiving Message bytes/sec |
LSRegistrationPerf | LS:Arch Agent - MSMQ | Arch Agent - Archiving Messages/sec |
LSRegistrationPerf | LS:Arch Agent - MSMQ | Arch Agent - Call Details Recording Message bytes/sec |
LSRegistrationPerf | LS:Arch Agent - MSMQ | Arch Agent - Call Details Recording Messages/sec |
LSRegistrationPerf | LS:SIP - Authentication | SIP - Authentication System Errors/sec |
LSRegistrationPerf | LS:SIP - Authentication | SIP - Incoming Messages Not Authenticated/sec |
LSRegistrationPerf | LS:SIP - Authentication | SIP - Incoming Messages Not Authorized/sec |
LSRegistrationPerf | LS:SIP - Load Management | SIP - Address space usage |
LSRegistrationPerf | LS:SIP - Load Management | SIP - Average Holding Time For Incoming Messages |
LSRegistrationPerf | LS:SIP - Load Management | SIP - Incoming Messages Timed out |
LSRegistrationPerf | LS:SIP - Networking | SIP - Connections Refused Due To Server Overload |
LSRegistrationPerf | LS:SIP - Networking | SIP - Connections Refused Due To Server Overload/Sec |
LSRegistrationPerf | LS:SIP - Peers(*) | SIP - Average Outgoing Queue Delay |
LSRegistrationPerf | LS:SIP - Peers(*) | SIP - Connections Active |
LSRegistrationPerf | LS:SIP - Peers(*) | SIP - Received Bytes |
LSRegistrationPerf | LS:SIP - Peers(*) | SIP - Received Bytes/sec |
LSRegistrationPerf | LS:SIP - Peers(*) | SIP - Sent Bytes |
LSRegistrationPerf | LS:SIP - Peers(*) | SIP - Sent Bytes/sec |
LSRegistrationPerf | LS:SIP - Peers(*) | SIP - TLS Connections Active |
LSRegistrationPerf | LS:SIP - Protocol | SIP - Average Incoming Message Processing Time |
LSRegistrationPerf | LS:SIP - Protocol | SIP - Messages In Server |
LSRegistrationPerf | LS:SIP - Responses | SIP - Local 500 Responses |
LSRegistrationPerf | LS:SIP - Responses | SIP - Local 500 Responses/sec |
LSRegistrationPerf | LS:SIP - Responses | SIP - Local 503 Responses/sec |
LSRegistrationPerf | LS:USrv - Endpoint Cache | USrv - Active Registered Endpoints |
LSRegistrationPerf | LS:USrv - REGDBStore | USrv - Queue Depth |
LSUser | LS:MCUF - MCU Factory | MCUF - GetMCU Requests Received/sec |
LSUser | LS:MCUF - MCU Factory | MCUF - Health Notifications Received/sec |
LSUser | LS:MCUF - MCU Factory | MCUF - Total Drain Requests Received |
LSUser | LS:MCUF - MCU Factory | MCUF - Total GetMCU Requests Failed |
LSUser | LS:MCUF - MCU Factory | MCUF - Total GetMCU Requests Received |
LSUser | LS:MCUF - MCU Factory | MCUF - Total Health Notifications Failed |
LSUser | LS:MCUF - MCU Factory | MCUF - Total Health Notifications Received |
LSUser | LS:MCUF - MCU Factory | MCUF - Total empty GetMCU Responses |
LSUser | LS:USrv - BACKUPXDSDBStore | USrv - Queue Depth |
LSUser | LS:USrv - DBStore | USrv - Queue Depth |
LSUser | LS:USrv - Endpoint Cache | USrv - Active Registered Endpoints |
LSUser | LS:USrv - GetPresence sproc | USrv - Sproc calls/Sec |
LSUser | LS:USrv - Pool Conference Statistics | USrv - Active Conference Count |
LSUser | LS:USrv - Pool Conference Statistics | USrv - Active Focus Endpoint Count |
LSUser | LS:USrv - Pool Conference Statistics | USrv - Active Mcu Session Count |
LSUser | LS:USrv - Pool Conference Statistics | USrv - Active Participant Count |
LSUser | LS:USrv - Pool Conference Statistics | USrv - Conference Count |
LSUser | LS:USrv - REGDBStore | USrv - Queue Depth |
LSUser | LS:USrv - ReplicationDBStore | USrv - Queue Depth |
LSUser | LS:USrv - Rich presence service SQL calls | USrv - RtcPublishMultipleCategories Sproc calls/Sec |
LSUser | LS:USrv - Rich presence subscribe SQL calls | USrv - Average number of users per subscribe request |
LSUser | LS:USrv - Rich presence subscribe SQL calls | USrv - RtcBatchQueryCategories Sproc calls/Sec |
LSUser | LS:USrv - Rich presence subscribe SQL calls | USrv - RtcBatchSubscribeCategoryList Sproc calls/Sec |
LSUser | LS:USrv - Rich presence subscribe SQL calls | USrv - RtcSubscribeSelf Sproc calls/Sec |
LSUser | LS:USrv - SHAREDDBStore | USrv - Queue Depth |
LSUser | LS:USrv - Server Aggregation | USrv - Number of aggregation requests/second |
LSUser | LS:USrv - Service | USrv - MWI NOTIFYs received/Sec |
LSUser | LS:USrv - UpdateEndpoint sproc | USrv - Sproc calls/Sec |
LSWeb | LS:JoinLauncher - Join Launcher Service Failures | JOINLAUNCHER - Join failures |
LSWeb | LS:JoinLauncher - Join Launcher Service Failures | JOINLAUNCHER - Join failures due to Incorrect meeting URL format |
LSWeb | LS:JoinLauncher - Join Launcher Service Failures | JOINLAUNCHER - Join failures due to Lookup User failure |
LSWeb | LS:JoinLauncher - Join Launcher Service Failures | JOINLAUNCHER - Join failures due to Verify Conference Key failure |
LSWeb | LS:JoinLauncher - Join Launcher Service Failures | JOINLAUNCHER - Join failures due to base URL mapping to multiple tenants |
LSWeb | LS:JoinLauncher - Join Launcher Service Failures | JOINLAUNCHER - Join failures due to failure to lookup Base URL |
LSWeb | LS:JoinLauncher - Join Launcher Service Failures | JOINLAUNCHER - Join failures due to join taking longer than expected time |
LSWeb | LS:JoinLauncher - Join Launcher Service Failures | JOINLAUNCHER - Join failures due to no matching Regex found |
LSWeb | LS:JoinLauncher - Join Launcher Service Failures | JOINLAUNCHER - Join failures to send Conference Error Reports. |
LSWeb | LS:JoinLauncher - Join Launcher Service incoming Requests | JOINLAUNCHER - Incoming join requests |
LSWeb | LS:JoinLauncher - Join Launcher Service incoming Requests | JOINLAUNCHER - Incoming join requests from ARM devices |
LSWeb | LS:WEB - Address Book File Download | WEB - Average processing time for a succeeded file request in milliseconds |
LSWeb | LS:WEB - Address Book File Download | WEB - Failed File Requests/Second |
LSWeb | LS:WEB - Address Book File Download | WEB - Succeeded File Requests/Second |
LSWeb | LS:WEB - Address Book Web Query | WEB - Average processing time for a address book database query in milliseconds |
LSWeb | LS:WEB - Address Book Web Query | WEB - Average processing time for a search request in milliseconds |
LSWeb | LS:WEB - Address Book Web Query | WEB - Failed search requests/sec |
LSWeb | LS:WEB - Address Book Web Query | WEB - Successful search requests/sec |
LSWeb | LS:WEB - Device Update | WEB - Total Log Upload Attempts |
LSWeb | LS:WEB - Device Update | WEB - Total Update Requests |
LSWeb | LS:WEB - Distribution List Expansion | WEB - Average Active Directory Fetch time in milliseconds |
LSWeb | LS:WEB - Distribution List Expansion | WEB - Average member properties fetch time in milliseconds |
LSWeb | LS:WEB - Distribution List Expansion | WEB - Soap exceptions/sec |
LSWeb | LS:WEB - Distribution List Expansion | WEB - Successful Request Processing Time |
LSWeb | LS:WEB - Distribution List Expansion | WEB - Timed out Active Directory Requests/sec |
LSWeb | LS:WEB - Distribution List Expansion | WEB - Timed out Requests that fetch member properties/sec |
LSWeb | LS:WEB - Distribution List Expansion | WEB - Valid User Requests/sec |
The following alert thresholds are set by default:
Lync/Skype for Business Component | Performance Object Name | Counter Name | Alert Condition |
LSInstant Message Conferencing | LS:ImMcu - IMMcu Conferences | IMMCU - Throttled Sip Connections | WARN if >= 2 |
LSInstant Message Conferencing | LS:ImMcu - MCU Health And Performance | IMMCU - MCU Health State | WARN if = 1 ALARM if = 2 or 3 |
LSRegistration | LS:SIP - Peers(*) | SIP - Connections Active | WARN if >= 10000 ALARM if >= 15000 |
LSRegistration / LSEdge | LS:SIP - Load Management | SIP - Average Holding Time For Incoming Messages | WARN if >= 3000 ms ALARM if >= 6000 ms |
These are the only performance alerts that are generally recommended by Microsoft Lync/Skype for Business experts. The other counters are available for further diagnosis when problems occur.
Installation
Prerequisites
In order to setup the monitoring of Microsoft Lync Server 2013/Skype for Business, you will need to make sure the following items are available, installed and properly configured:
- A fully functional BMC PATROL environment (optionally part of a larger BPPM environment), with a BMC PATROL Console
- A PATROL Agent on the Lync Server itself
- The latest version of Monitoring Studio KM for PATROL and latest patches, is installed on the agent on the Lync/Skype for business Server
- Monitoring Studio KM is properly loaded on the agent and in the console
- The monitoring template is installed in a folder on the Lync/Skype for business Server itself
Procedure
- From the PATROL Console, [right-click] on the main “Monitoring Studio” icon > [KM Commands] > [Configuration] > [Import Configuration…]
Figure 7- Import Configuration
- Enter the path of the folder where you have stored the configuration file (this path is on the agent, on the Lync/Skype for Business Server):
Figure 8- File Location
- Select the configuration file in the list:
Figure 9- File Selection
- Monitoring Studio checks the content of the file. This process can take a few minutes (the configuration file is rather large).
Figure 10- Importing
- Monitoring Studio then asks whether the %{PATH} application constant should be cleared. If the default value (“D:\Lync”) happens to match with the installation directory of Microsoft Lync/Skype for Business on the server, then you can click on the [Keep values] button. Otherwise, click on the [Clear values].
Figure 11- Application Constants
- Monitoring Studio is ready to import the configuration. Click [Finish] to start the import.
Figure 12- Summary
- The import process can take a few minutes:
Figure 13- Importing
Figure 14- Importing
- After the import process completes, Monitoring Studio KM starts creating the icons corresponding to the monitored objects in the PATROL Console. If you chose to clear the %{PATH} application constant, all icons are kept OFFLINE and the actual monitoring does not start until you enter a valid value for the application constant:
Figure 15- Lync Monitoring in the PATROL Console
- To set the %{PATH} application constant value, [right-click] on the “Microsoft Lync Server” icon in the PATROL Console > [KM Commands] > [Modify Application Constants…]
Figure 16- Modifying Application Constants
- Next to the %{PATH} constant, enter the path of the folder where Microsoft Lync/Skype for Business has been installed on the system and click [OK]. It is usually c:\Program Files\Microsoft Lync Server or C:\Program Files\Skype for Business Server:
Figure 17- Application Constant Path
Monitoring Studio brings all the monitored objects ONLINE and the monitoring of Microsoft Lync/Skype for Business Server starts effectively. The initialization of the monitoring can take a couple minutes to complete.
Alternate Installation Procedure
Alternatively, PATROL administrators can use WPCONFIG.EXE, pconfig or PCM (PATROL Configuration Manager) to deploy the configuration file. Once this is done, it is recommended to force a full discovery on the PATROL Agent to make sure Monitoring Studio KM takes into account the new configuration immediately (without waiting for the next discovery cycle, which occurs by default every hour).
Figure 18- Deploying Configuration File Using wpconfig.exe
Editing the %{PATH} application constant as described above is only required if you wish to monitor the Total Lync/Skype Server Processes. This can also be done by editing the /MASAI/SENTRY8/Lync/constant1Value or /SENTRY/STUDIO/Skype/constant1Value configuration variable before applying the configuration to the selected Lync/Skype for Business servers.
Post-installation tasks
As explained earlier, depending on the role of a Microsoft Lync/Skype for Business Server, different components of Lync/Skype for Business Server have been installed and configured. The monitoring configured in Monitoring Studio covers all components of Lync/Skype for Business and you may need to disable or completely remove certain groups of monitored objects from the monitoring.
The different aspects of Microsoft Lync/Skype for Business are grouped in containers/folders in Monitoring Studio. In order to identify the components that would need to be removed from the monitoring, you simply have to first "browse" the tree view to find "Windows Service" objects for which the Status parameter cannot be collected (it stays OFFLINE, while the rest of the monitoring is properly collected). Also, an error message is displayed in the System Output Window for each service that is not installed and thus whose monitoring cannot be performed:
For each “container” with a Windows service that is not installed and cannot be collected, you can remove it from the monitoring. [Right-click] on the container icon > [KM Commands] > [Delete]:
Once this is done, you should no longer see any error message in the System Output Window and the monitoring is adapted to your environment.