Organizing Incident Management

Cyber Criminals understand most of the organizations have their employees working from home, so they are using new different ways to find vulnerabilities, or to extort money. The launch of Covid-19-themed attacks in the form of phishing emails with malicious attachments that use malware to disrupt systems and steal confidential data is on the rise.

According to ‘APWG’s Phishing Activity Trends’ Report for Q1 2020, phishing attacks rose in prevalence to a level that hasn’t been observed since 2016, with over 60,000 phishing sites being reported in March alone.

Source: Comparitech

Recently, there has been several cases of business disruption where video conferencing tools are hacked using the newfound vulnerabilities.

Adding to this, the IT team is under a tough task of securing and monitoring users connecting from personal devices from home networks. Although several advisories have been issued on securing workstation and home networks in public interest, compliance is not easily trackable.

Let’s take the example of Zoom video conferencing application which is being widely used for remote working. Researchers have found the credentials for more than 500,000 Zoom users. This doesn’t really mean Zoom got hacked. What it means is that the accounts that were on sale on the dark web were obtained using “credential stuffing” where the hackers use a combination of email and passwords obtained through previous hacks.

Another example, where Micah Lee and Yael Grauer, writing for The Intercept, (article dated March 31, 2020), reported that Zoom meetings and calls were vulnerable to eavesdropping and weren’t End-to-End Encrypted, despite misleading marketing on various media platforms. The article explores and explains what end-to-end (E2E) encryption is, why it’s important and points out some of the claims that Zoom makes on its website about using end-to-end encryption for video conferencing.

The attackers are taking advantage of the lack of general security awareness and using this situation to launch well-known exploits on vulnerable devices and tinker with unsuspecting places such as router’s DNS configuration, changes that most will most easily go unnoticed.

So how exactly does one get organized for Incident Management and Responses?

Below are few best practices that can be adapted by any company/ organization to ensure efficient incident management.

#1 Define SLA for the IT team

The IT Support teams are either in-house or outsourced. The point where an outsourced model outscores the in-house model is the stringent SLA definition.

Although the IT team falls under the support services category, it needs to be driven like a cost center to ensure that incident response is timely and appropriate. SLA definition will involve defining an acceptable time frame within which any incident needs to be responded to and resolved. A cost component can be also added to improve the competency.

We recommend that while defining SLAs parameters like category, impact, urgency etc. be included to ensure clarity.

Useful Reading : https://blog.hubspot.com/blog/tabid/6307/bid/34212/how-to-create-a-service-level-agreement-sla-for-better-sales-marketing-alignment.aspx

#2 Defining a workflow:

A workflow is very crucial for timely and appropriate response upon incident detection. It channelizes the roles and responsibilities of all the members in the incident response team.

We recommend defining the following as part of the workflow:

Define roles and responsibility matrix: Assign responsibility based on the type of incident / asset affected. Further identify the responsibilities of L1, L2 and L3 teams.
Defining turnaround times: Assign response timelines for response team and cutoff timelines for escalation and redressal.
Defining a mailing list: keep a mailing list with all the relevant stakeholders who need to be notified about an incident.
Implement the workflow in the incident reporting tool: Automate the workflow for ticket assignment, SLA monitoring, escalation and redressal in the ticketing system. This will clear up the response team’s time to more crucial tasks at hand. Helps in avoiding process slippages.

#3 Maintaining Proper Documentation:

Proper documentation to capture the details of the incidents and the actions taken for resolution will prove to be useful for any investigation / forensic analysis. It ensures that time is not wasted in re-inventing the wheel for incidents that have previously occurred and resolved successfully.

More Documentation –> Prevents Communication gaps

More Documentation -> creates more records

More Records -> Generates more accurate information on the resolution.

We strongly recommend maintaining an Incident Wiki: Most organizations have a knowledge base of all incidents and incident resolutions – having them indexed can be useful for people to refer to when a similar incident occurs, increase in efficiency comes without saying.

There are many benefits of having an incident wiki rather than having a paper trail/individual documents for incidents:

Maintaining a paper trail / individual document could take time, they take up space that could be used for other things, and eventually after some time someone will need to destroy old records that are no longer useful but with an incident wiki you don’t need to destroy old records.
It can be a used as a tool that allows employees to go through an incident and monitor and contribute to the solution without clogging email inboxes.
An incident Wiki can be an area where multiple employees (often in different departments or different locations) can work simultaneously on one issue.
And most importantly having a digital incident wiki saves trees.

Useful Reading : https://www.makeuseof.com/tag/4-sites-create-wikipedialike-website/

#4 Defining the roadmap for Incident Resolution:

We commonly find that incident management workflow misses out defining components for incident resolution. This is critical to ensure that the resolution is fool-proof and that the resolution is based on advisories from authorised sources.

We strongly recommend:

Testing the resolution: If the resolution involves configuration / code level changes, we recommend the IT team to test the change on the test environment prior to deploying the same on production.
Identifying a roll-back option: Not all resolution efforts pay back successfully. There is a good chance of the configuration / code change affecting the integrity of the data even after successful testing on the test environment. Restoring the system to the last known good state should be the default recovery plan. The last known good state of the system is the known and verified state of the system
Conducting a postmortem: What do we do when an incident recurs frequently? Is there more than what meets the eye. This can be achieved by conducting a postmortem analysis of frequently recurring incidents and understand the common thread. This investigation will help to understand the root cause of the problem and permanently fix the incident.

This unexpected shift towards working remotely has bought in many new risks. And while each organization has their own ways to tackle these circumstances, the above-mentioned steps can help you get started in the right direction.

The above pointers can help the teams working from home to improve their incident response procedure, achieve a good level of readiness for new incidents along with giving the team the right amount of confidence to handle incidents. While working remotely for SOC, NOC, Support role can be overwhelming at the start, it only gets better with every passing day.

As the we gear up along with the entire world to fight this pandemic, we continue to be inspired by our frontline workers and by others who are caring for people around the world.