name: agenda class: middle, center # Komodor Academy Workshops ## Accelerating Mean Time to Resolution (MTTR) with Komodor ??? These are my really great slide notes --- # Schedule and Introductions | Time | Topic | Description | |---------------|------------------|----------------------------------------:| | 00:00 - 00:10 | Introduction | Overview of the workshop and objectives | | 00:10 - 00:30 | Komodor Overview | Why Komodor? | | 00:30 - 00:45 | Account Setup | Setting up your Komodor account | | 00:45 - 01:00 | Hands-On #1 | Observing Service Health | | 01:00 - 01:15 | Hands-On #2 | Incident Deep Dive | | 01:15 - 01:30 | Hands-On #3 | Incident Remediation | | 01:30 - ??? | Conclusion | Thanks for Attending! Q&A | --- # Introduction
Why Komodor?
A brief history of
the universe
Komodor
---  ---  ---  --- class: middle, center ## Komodor Academy Workshops
Accessing your Lab Instance
--- ## Logging into Komodor
Retrieve the credentials provided by your instructor
Navigate to
app.komodor.com
--- ## Enter your information in the lab's Single Sign On Portal  --- ## Observe the services configured for your lab instance  --- class: middle, center ## Komodor Academy Workshops
Hands-On Exercises
--- class: middle, center ## Komodor Academy Workshops
Hands-On Exercise 1: Observing Service Health
--- ## Hands-On Exercise 1: Observing Service Health
In Komodor, a service represents a unit of software
In Kubernetes, these are Deployments, StatefulSets, and DaemonSets.
Select the
python-app-error
card to observe this service's events timeline
--- ## Hands-On Exercise 1: Observing Service Health Cont'd
Observe the service's events timeline - This is your source of truth
The timeline is often enriched with external cloud native infrastructure events (Datadog, PagerDuty, etc)
Your "Single Pane of Glass"
--- ## Hands-On Exercise 1: Observing Service Health Cont'd
Let's ask Klaudia a question about this service
Click the
Ask Klaudia
button
Ask Klaudia one of the preselected questions, or type your own
--- class: middle, center ## Komodor Academy Workshops
Hands-On Exercise 2: Deep Diving on the Incident
--- ## Hands-On Exercise 2: Deep Dive on the Incident
Select one of the "Pod running (Not Ready) Events
The Pod is continuously looping in a
CrashLoopBackOff
state
This is a very common and frustrating state in Kubernetes
--- ## Hands-On Exercise 2: Deep Diving on the Incident Cont'd
Return to the Events Timeline
Click "Reliability Violations"
Open the Reliability Violation for this service
--- ## Hands-On Exercise 2: Deep Diving on the Incident Cont'd
Return to the Events Timeline
Select
Investigate
to enable Klaudia Root Cause Investigation (This button reappears when the service flaps between healthy and unhealthy
Klaudia investigates the issue
--- ## Hands-On Exercise 2: Deep Diving on the Incident Cont'd
Klaudia takes a multi-step approach to investigate the issue
Klaudia provides a summary of the issue
Klaudia provides supporting evidence of the issue
--- ## Hands-On Exercise 2: Deep Diving on the Incident Cont'd
Scrolling down, we see more supporting evidence to the issue
An environment variable is intentionally failing the pod
Most importantly, Klaudia provides a path to remediation
--- ## Komodor Academy Workshops
Hands-On Exercise 3: Remediation
--- ## Hands-On Exercise 3: Remediation
Klaudia recommends removing the
EXIT_CODE
environment variable
Return to the Events Timeline for the Service
Select the
Describe
button to validate the environment variable
--- ## Hands-On Exercise 3: Remediation Cont'd
This is the state of the service as known to Kubernetes
The
EXIT_CODE
environment variable is present
How can we fix this based on Klaudia's recommendation?
--- ## Hands-On Exercise 3: Remediation Cont'd
Return to the Service's Events Timeline
Click the three-dot menu, and select
Edit YAML
You will now be able to edit the Deployment manifest
--- ## Hands-On Exercise 3: Remediation Cont'd
Remove the
EXIT_CODE
environment variable from the manifest
Click Apply Changes
This triggers a new deployment
--- ## Hands-On Exercise 3: Remediation Cont'd
Observe the new deployment events in the timeline
The number of events has increased as well
Select the `Manual Event`
--- ## Hands-On Exercise 3: Remediation Cont'd
The deployment event shows all the auditable actions
Quickly visualize what changed
Select
View all 1 Changes on Diff
--- ## Hands-On Exercise 3: Remediation Cont'd
As a convenience, Komodor shows you the diff of the deployment
This shows what was, and what is now present on the cluster
This concludes the remediation exercise!
--- ## Komodor Academy Workshops
Thank You!