52 questions

Mid questions

Every rated question at this level, grouped by topic and tagged with the company it came up at.

CI/CD 9 questions

Automated GitOps Promotion
CI/CD · CapitalOne · mid
More
Scenario

You are setting up a GitOps pipeline. repo-a contains the application source code, and repo-b contains the Kubernetes configuration (values.yaml).

When code is pushed to repo-a, the pipeline must:
1. Build a Docker image tagged with the Short Git SHA.
2. Automatically update repo-b to use this new image tag.
Task

Edit the existing workflow file at /home/interview/repo-a/.github/workflows/promote.yml to complete the pipeline.
1. Calculate SHA: Finish the step to extract the first 7 characters of $GITHUB_SHA into an environment variable named SHORT_SHA.
2. Build: Build a Docker image named app tagged with ${{ env.SHORT_SHA }}.
3. Checkout Infrastructure: Configure the checkout step to clone interview/repo-b into a directory named infra.
4. Promote: Inside the infra directory:
  - Update values.yaml so that the tag key reflects the new SHORT_SHA.
  - Commit the change with the message: Update tag to <SHORT_SHA>.
Automated Rollback on Deployment Failure with Values File Restoration
CI/CD · NVIDIA · mid

More

Scenario

A repository at /home/interview/deploy-repo contains a values.yaml file tracking the current Docker image tag and a deploy.sh script that simulates production deployment. When deployments fail, the team must manually revert values.yaml to the previous working tag. An automated rollback mechanism is needed to detect failures and restore the last known-good tag with a commit by CI Bot. A starter workflow file has been created at .github/workflows/deploy.yml with the basic structure.

Task

Navigate to /home/interview/deploy-repo and complete the GitHub Actions workflow at .github/workflows/deploy.yml that triggers on push to main branch with two jobs: a deploy job that runs ./deploy.sh, and a rollback job that depends on the deploy job, runs only when deployment fails, restores values.yaml to its state from the previous commit, and commits the change with author "CI Bot" [email protected] and message chore: automatic rollback due to deployment failure. Test with: ./github-ci push
Docker Image Tagging with Commit SHA
CI/CD · Microsoft · mid

More

Scenario

A repository at /home/interview/repo contains a Dockerfile but no automated build process. Developers manually build Docker images without consistent tagging, making it difficult to track which image corresponds to which code version. A starter workflow file has been created at .github/workflows/build.yml with the basic structure.

Task

Navigate to /home/interview/repo and complete a GitHub Actions workflow at .github/workflows/build.yml that automatically builds a Docker image named app and tags it with the short commit SHA on every push to the main branch. The pipeline should ensure each commit produces a uniquely identifiable image. CI Pipeline can be executed and tested with ./github-ci push command.
GitHub Actions Matrix Build Strategy
CI/CD · Spotify · mid

More

Scenario

A repository at /home/interview/repo contains a Node.js application with a test suite at ./run-tests.sh. The application needs to be tested across multiple Node.js versions (18, 20, and 22) to ensure compatibility, but currently has no automated testing configured. A starter workflow file has been created at .github/workflows/pr-tests.yml with the basic structure. The repository has the upload-artifact action available locally at .github/actions/upload-artifact for artifact management.

Task

Navigate to /home/interview/repo and complete the GitHub Actions workflow at .github/workflows/pr-tests.yml that runs the test suite across Node.js versions 18, 20, and 22 using a matrix strategy. Each matrix job should create a node-version.txt artifact containing its Node.js version number. The workflow should execute on pull requests. The workflow can be tested with ./github-ci pull_request and artifacts will be available in /tmp/github-artifacts/1/ for verification.

Important: Use container images (node:${{ matrix.node-version }}-slim) instead of setup-node actions to configure Node.js versions.
Job Dependency Enforcement
CI/CD · Splunk · mid

More

Scenario

A repository at /home/interview/repo contains a workflow file .github/workflows/pipeline.yml. The workflow currently defines three jobs: lint, test, and build. These jobs are configured to run in parallel without any dependencies.

Task

Update the pipeline to enforce a strict execution sequence. Configure the test job to depend on the successful completion of lint. Configure the build job to depend on the successful completion of test. The final execution order must be Lint → Test → Build. The pipeline execution can be verified with ./github-ci push.
Multi-Job Workflow with Artifact Handoff
CI/CD · Anthropic · mid
More
Scenario

A repository at /home/interview/repo contains a Node.js application with a test suite at ./run-tests.sh. The development team needs a two-stage CI/CD pipeline where the first job runs tests and generates test results, and a second job downloads those results to create a summary report. The repository has upload-artifact and download-artifact actions available locally at .github/actions/ for artifact management. A starter workflow file has been created at .github/workflows/artifact-handoff.yml with the basic structure.

Task

Navigate to /home/interview/repo and complete the GitHub Actions workflow at .github/workflows/artifact-handoff.yml that implements a two-job pipeline triggered on pull_request events using container images (node:20-slim):
1. Job A (test-job): Run the test suite and upload the test-results.txt file as an artifact named "test-results" using ./.github/actions/upload-artifact
2. Job B (report-job): Download the test-results artifact using ./.github/actions/download-artifact, verify it exists, count passed tests with grep -c "PASS:", save the count to summary.txt in format "Total Passed Tests: X", and upload as "test-summary" artifact. Job B must depend on Job A using the "needs" keyword.
The workflow can be tested with: ./github-ci pull_request
Path-Based Workflow Execution
CI/CD · Coinbase · mid

More

Scenario

A repository at /home/interview/repo contains a multi-component application with infrastructure code in the /infra directory and documentation in the /docs directory. Currently, the repository has no automated workflow configured with path-based filtering to run workflows only when specific files change. A starter workflow file has been created at .github/workflows/infra-check.yml with the basic structure. The repository has the upload-artifact action available locally at .github/actions/upload-artifact for artifact management.

Task

Navigate to /home/interview/repo and complete the GitHub Actions workflow at .github/workflows/infra-check.yml that runs only when files under the /infra directory change. The workflow should trigger on push events, use container image (node:20-slim), run a validation script at ./validate-infra.sh, and create an artifact named "validation-report" containing validation-result.txt using ./.github/actions/upload-artifact. The workflow can be tested with ./github-ci push and artifacts will be available in /tmp/github-artifacts/1/ for verification.
PR Test Gate
CI/CD · Adobe · mid

More

Scenario

A repository at /home/interview/repo has a test suite located at ./run-tests.sh but no automated testing on pull requests. Developers must manually run tests before merging, leading to inconsistent test coverage and occasional bugs in the main branch. A starter workflow file has been created at .github/workflows/pr-tests.yml with the basic structure. The repository already has the upload-artifact action available locally at .github/actions/upload-artifact for artifact management.

Task

Navigate to /home/interview/repo and complete the GitHub Actions workflow at .github/workflows/pr-tests.yml that triggers on pull request events, runs the test suite ./run-tests.sh, and uploads the test results from test-results.txt as an artifact named test-results using ./.github/actions/upload-artifact. The artifact upload should run even if tests fail using if: always(). Test with: ./github-ci pull_request and artifacts will be available in /tmp/github-artifacts/1/ for verification.
Reusable Workflow with Input Parameters
CI/CD · Adobe · mid

More

Scenario

A repository at /home/interview/repo contains multiple applications that share common build and test steps. The development team wants to eliminate code duplication across workflows by creating a centralized reusable workflow that can be called from different workflows with different parameters.

Task

Create two GitHub Actions workflows: a reusable workflow at .github/workflows/shared-build.yml that accepts an input parameter app-name, runs a build script ./build.sh with the app name, and uploads a build artifact named "build-{app-name}"; and a caller workflow at .github/workflows/deploy.yml that triggers on push events, calls the reusable workflow with app-name: "frontend", and uses container image (node:20-slim). You can test with: ./github-ci push

Cloud 1 question

Design Egress Only VPC with NAT
Cloud · Twitch · mid
More
Scenario

We need to prepare infrastructure for ECS tasks and EC2 instances. It has to span across at least two Availability Zones. These workloads require outbound internet access to download updates and call external APIs. However, inbound access is not allowed. Additionally application should send data to S3 in cost effective way and so we have to deploy necessary infrastructure for that traffic too.

Note: You are required to design the VPC networking architecture only. Creation of ECS clusters, services, or EC2 instances is not part of this task.

Task

Design and implement a VPC network architecture 10.0.0.0/16 that meets the following requirements:
1. Network Isolation: Workloads must reside in subnets across two Availability Zones with no public IP addresses and must not be directly reachable from the internet.
2. Egress Control: Workloads must be able to initiate outbound connections to the public internet over HTTP and HTTPS. The internet must not be able to initiate connections to them. Note: You should create and configure a new non-default security group.
3. Egress Restrictions: Outbound traffic from workloads should be limited to only the required protocols and destinations.
4. Cost Awareness: The architecture should account for cost-efficient routing when accessing AWS-managed services, minimizing unnecessary NAT Gateway usage.
Note: You can use either the AWS Management Console or AWS CLI to complete this task.

📹 Video Solution

Containers 5 questions

Graceful Shutdown with SIGTERM Handling
Containers · Robinhood · mid

More

Scenario:

You have a container image myapp:grace built from /home/interview/Dockerfile that runs an application requiring cleanup on shutdown. When you run docker stop, the container is killed after the 10-second timeout instead of shutting down gracefully because the application doesn't properly handle the SIGTERM signal.

Task:

Fix the application script and Dockerfile to properly handle SIGTERM signals, implement cleanup logic, and ensure the container exits gracefully within 20 seconds when docker stop is executed.

📹 Video Solution
Insecure Container Root User
Containers · Accenture · mid
More
Scenario

A Dockerfile at /home/interview/Dockerfile builds a Python web application tagged as myapp:secure. The container runs as the root user (UID 0) with extensive Linux capabilities, violating the principle of least privilege.

Task

Harden the Dockerfile by creating a non-root user appuser with UID 10001, switching to that user for application execution, ensuring application files have appropriate permissions, rebuilding the image with the tag myapp:secure, and verifying the container runs with reduced privileges while maintaining full functionality.

Example
```
# Before (running as root)

uid=0(root) gid=0(root) groups=0(root)
```
```
# After (running as non-root user)

uid=10001(appuser) gid=10001(appuser) groups=10001(appuser)

Reduced capability set present

Application responds with non-root user confirmation
```
curl http://localhost:5000/health Response: {"status":"healthy","uid":10001,"user":"appuser"}

📹 Video Solution
Memory Limit and OOM Killer
Containers · DeliveryHero · mid
More
Scenario

A container named mem_test running from image myapp:mem contains a memory stress script at /app/stress_memory.sh. Currently, the container runs with unbounded memory and can consume as much RAM as available on the host, preventing the OOM (Out of Memory) killer from terminating it even when it allocates excessive memory.

Task

Apply 100MB memory limits to the container so that running the provided stress script causes the container to be OOM-killed. Stop the container, restart it with memory limits applied using the appropriate flags, run the stress script, and verify the container gets OOM-killed when the memory limit is exceeded by checking the OOM kill status.

_Note: The container mem_test and image myapp:mem with the stress script are already available.

Example
```
# Before (no memory limits)
$ docker inspect mem_test --format 'Memory: {{.HostConfig.Memory}}'
Memory: 0

$ docker inspect mem_test --format 'OOMKilled: {{.State.OOMKilled}}'
OOMKilled: false
```
```
# After (memory limits applied and stress script triggers OOM)
$ docker inspect mem_test --format 'Memory: {{.HostConfig.Memory}}'
Memory: 104857600

$ docker inspect mem_test --format 'OOMKilled: {{.State.OOMKilled}}'
OOMKilled: true

$ docker inspect mem_test --format 'ExitCode: {{.State.ExitCode}}'
ExitCode: 137
```
📹 Video Solution
Optimize Dockerfile
Containers · Shopify · mid
More
Scenario

A Dockerfile at /home/interview/Dockerfile successfully builds a Go application, but the resulting Docker image is over 800MB in size due to including the full Go toolchain, build dependencies, and source code. Production images must be under 200MB.

Task

Rewrite the Dockerfile using a multi-stage build pattern to separate the build environment from the runtime environment. Use Alpine as the base image for the final runtime stage, ensure only the compiled binary and necessary runtime dependencies are copied to the final stage, build the optimized image with the tag myapp:fixed, and verify the final image size is below 200MB while maintaining full functionality.

Example
```
# Before (bloated image)

REPOSITORY    TAG       SIZE
myapp         original  550MB
```
```
# After (optimized multi-stage build)

REPOSITORY    TAG       SIZE
myapp         fixed     45.3MB
```
Storage Driver Performance Fuse Overlayfs
Containers · Github · mid

More

Scenario:

You have a container image myapp:fs built from /home/interview/Dockerfile that performs intensive filesystem write operations. Docker is currently using the overlayfs storage driver and write performance is a bottleneck.

Task:

Configure Docker deamon /etc/docker/daemon.json to use appropriate configuration for write heavy setup. Restart the Docker daemon, and rebuild the myapp:fs image for validation testing.

📹 Video Solution

Git 9 questions

Create an Annotated Tag
Git · Nintendo · mid

More

Scenario:

You have a Git repository at /home/interview/repo where you've just completed version 3.1.0 of your application. You need to create an annotated tag to mark this release with a proper message and metadata.

Task:

Create an annotated tag for version v3.1.0 with a descriptive message and push it to the remote origin repository v3.1.0

📹 Video Solution
Fix Repository with Unrelated Histories
Git · Zscaler · mid
More
Scenario:

The repository at /home/interview/repo is in a broken state. Local and remote branches have diverged with no common ancestor. Consequently, git push origin main fails with a non-fast-forward error, and git pull origin main fails because the histories are unrelated.

Task:

Navigate to /home/interview/repo. Merge and linearize the unrelated histories (using rebase) to create a single commit sequence.

Example:
```
# Before (Broken)
$ git pull origin main
fatal: refusing to merge unrelated histories

# After (Fixed - Linear History)
$ git pull origin main
Already up to date.

$ git log --oneline --decorate
e8f9g0h (HEAD -> main, origin/main) Add local feature B
d7e8f9g Add local feature A
b5c6d7e Add remote config
a4b5c6d Remote initial commit
```
📹 Video Solution
Merge Repositories Preserving Both Histories
Git · Zscaler · mid
More
Scenario:

You have two separate Git repositories at /home/interview/repo-a (5 commits) and /home/interview/repo-b (4 commits) developed independently. Create a new monorepo at /home/interview/monorepo that combines both repositories into separate subdirectories using subtree (project-a/ and project-b/) while preserving the full commit history from both repositories.

Example:
```
# Before (two separate repositories)
$ cd /home/interview/repo-a && git log --oneline | wc -l
5
$ cd /home/interview/repo-b && git log --oneline | wc -l
4
```
```
# After (combined monorepo with both histories)
$ cd /home/interview/monorepo && ls -la
project-a/  project-b/  .git/
$ git log --all --oneline | wc -l
12
```
📹 Video Solution
Recover Lost Commits from Detached Head
Git · Kayak · mid
More
Scenario:

You have a Git repository at /home/interview/repo where you were in a detached HEAD state, made 3 commits, then switched back to the main branch. Those 3 commits are now unreachable and appear to be lost since no branch references them.

Task:

Navigate to the repository at /home/interview/repo, check the reflog to locate the lost commits from the detached HEAD state, create a new branch called recovered-work pointing to those commits.

Example:
```
# Before (commits lost, only main visible)
$ git log --oneline -3
c3d4e5f Main branch work
b2c3d4e Second commit
a1b2c3d Initial commit
$ git branch
* main
```
```
# After (commits recovered in new branch)
$ git branch
* main
  recovered-work
$ git log recovered-work --oneline -3
f6a7b8c Third detached commit
e5f6a7b Second detached commit
d4e5f6a First detached commit
```
📹 Video Solution
Remove File from Entire Git History
Git · Netflix · mid
More
Scenario

A file named secrets.env containing sensitive credentials exists in the repository's commit history. You need to purge this file from the entire history.

Task

Navigate to /home/interview/repo and rewrite the commit history to permanently remove secrets.env from all commits.

Example
```
# Before (Sensitive data in history)
$ git log --all --oneline -- secrets.env
a1b2c3d Add secrets file

# After (Clean history)
$ git log --all --oneline -- secrets.env
(no output)
```
📹 Video Solution
Restore File to Previous Version
Git · Slack · mid

More

Scenario:

You have a Git repository at /home/interview/repo where the config.js file has been modified in the last two commits, but those changes introduced bugs. You need to restore only config.js to the version it had 2 commits ago without affecting any other files

Task:

Restore config.js to its state from 2 commits ago, stage and commit this change with Restore config.js message.

📹 Video Solution

Stash Work Fix Bug Restore and Update

Git · IBM · mid

Scenario

Uncommitted changes on feature-ui prevent you from switching branches to fix a critical bug on main.

Task

Stash local changes to clean the working directory with message WIP: UI improvements. Switch to a new branch hotfix-auth to implement the fix, then merge it into main. Finally, rebase feature-ui against the updated main and restore your stashed changes.

Changes in hotfix-auth branch are below:

echo  "Fixing critical bug" >> src/auth.js
git commit -m "Fix critical authentication bug"

Example

# Before (Switch blocked)
error: Your local changes to the following files would be overwritten by checkout:
        src/ui.js
Please commit your changes or stash them before you switch branches.

# After (Updated and Restored)
$ git log --oneline -1 main
a1b2c3d Fix critical authentication bug
$ git status
On branch feature-ui
Changes not staged for commit:
  modified:   src/ui.js

📹 Video Solution

Stash Work Fix Bug Resume
Git · Kraken · mid

More

Scenario

You have a Git repository at /home/interview/repo where you are working on a new feature on the feature-login branch with uncommitted changes in login.js. A fix is needed on the dev branch: app.js contains a JavaScript syntax error that can be identified by running node app.js.

Task

Stash your current changes to clean the working directory, switch to the dev branch, fix the syntax error in app.js, commit the fix using the commit message Fix syntax error in app.js, then return to your feature-login branch, merge the fix from dev, and restore your stashed work.
Update Submodule to Latest Commit
Git · GoDaddy · mid

More

Scenario:

You have a Git repository at /home/interview/repo that contains a submodule in the vendor/utils directory. The submodule is pointing to an old commit, but newer commits exist on the submodule's remote repository.

Task:

Update the submodule to the latest commit on its default branch and commit this change in the parent repository.

📹 Video Solution

Kubernetes 13 questions

ConfigMap Reload With Sidecar

Kubernetes · Yelp · mid

Scenario

Your application needs to react to configuration changes without requiring a pod restart.

Task

Create a ConfigMap and a pod with a sidecar that watches for configuration file changes. Mount the ConfigMap to both containers.

Property	Value
Namespace	`default`
ConfigMap	`app-config`
ConfigMap key	`settings.conf`
ConfigMap value	`debug=false`
Pod	`app-pod`
Main container	`app`
Main image	`nginx:1.24`
Sidecar container	`config-watcher`
Sidecar image	`busybox`
Mount path	`/etc/config`

Sidecar container snippet is at /home/interview/watcher.yaml.

📹 Video Solution

Crashing Misconfigured Pod
Kubernetes · Reddit · mid

More

Scenario

You have a Kubernetes cluster where the deployment webapp in namespace prod is stuck in CrashLoopBackOff.

The application exposes a health check endpoint at /healthz on port 8080. The container image already includes a config directory with the required files, but the deployment configuration has issues preventing the pod from starting correctly.

Because the config directory already exists and is used by the application, mounting a ConfigMap over the entire directory would overwrite it and cause the app to fail. The deployment needs to be fixed so configuration is injected without replacing the existing directory.

Task

Fix the pod so it reaches Running state with 1/1 Ready

📹 Video Solution
CronJob Schedule Misconfiguration
Kubernetes · RedHat · mid
More
Scenario

A CronJob named cleanup in the ops namespace is failing to trigger as expected. It has an incorrect schedule, relies on the default timezone (which may not match the server), and retains too many completed jobs, cluttering the history.

Task

Fix the cleanup CronJob so that validation confirms it triggers exactly once per minute. Update the schedule to * * * * *, set the timezone to Etc/UTC, and ensure only the most recent successful run is retained.

Example

Current Status (Failing):
```
NAME      SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cleanup   0 0 1 1 *   False     0        <none>          5m
```
Target Status (Success):
```
NAME      SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cleanup   * * * * *   False     0        10s             7m
```
📹 Video Solution
Custom Resource Definition Setup
Kubernetes · ActivisionBlizzard · mid

More

Scenario

You need to extend the Kubernetes API to support a proprietary resource type called "Widget".

Task

Create a CustomResourceDefinition named widgets.mycompany.io. Define the group as mycompany.io and the kind as Widget short name wd and with served:true storage:true parameters. Specify the scope as Namespaced and define a version named v1. Create a Custom Resource instance of this type named sample-widget in the extensions namespace. Verify that the API accepts the new resource and that you can list it.

📹 Video Solution
DNS Based Service Discovery
Kubernetes · Netflix · mid
More
Scenario

In namespace disco, the application discovery-app relies on DNS-based discovery of peer pods. The service discovery-svc is supposed to return all backend pod IPs, but currently:
```
kubectl exec deploy/testpod -n disco -- nslookup discovery-svc.disco.svc.cluster.local
```
returns only one IP instead of individual pod IPs. The application cannot discover its peers and fails to form a cluster.

Task

Fix the service configuration so that DNS resolution of discovery-svc returns one IP per pod, allowing the application to discover all running instances.

Property Value

Namespace disco

Deployment discovery-app (3 replicas)

Service discovery-svc

Pod label app=discovery
Fix Job With ServiceAccount and RBAC Permission Issues
Kubernetes · SAP · mid
More
Scenario

You have a Kubernetes cluster where a job named data-loader in namespace ops fails with permission denied when calling the Kubernetes API. The job manifest file is located at /home/interview/data-loader-job.yaml.

Task

Fix the RBAC configuration and job specification so the job can successfully access the Kubernetes API to get and list pods.

Example
```
# Before (job fails)
$ kubectl get jobs -n ops
NAME          COMPLETIONS   DURATION   AGE
data-loader   0/1           2m         2m
```
```
# After (job completes)
$ kubectl get jobs -n ops
NAME          COMPLETIONS   DURATION   AGE
data-loader   1/1           15s        30s
```
Image Pull Backoff Secrets
Kubernetes · Datadog · mid

More

Scenario

A Deployment named backend in the dev namespace is failing to start. Pods are stuck in ImagePullBackOff due to configuration issues.

Task

Fix the Deployment so pods successfully pull the image and enter the Running state.

Image Information

Property Value

Registry ghcr.io

Repository prepare-sh/alpine

Version 3.23.2

Architecture amd64

Size 8.44 MB

Username preparesh-bot

Access Token <provided-in-terminal>

📹 Video Solution
Implement StatefulSet With Stable DNS
Kubernetes · Okta · mid

More

Scenario

You are deploying a distributed database named dns-app in the dev namespace. This application requires that each Pod be addressable by a predictable, unchanging hostname (e.g., dns-app-0.dns-app) so the nodes can find each other for data replication.

Task

Create a Headless Service named dns-app in the dev namespace. Create a StatefulSet named dns-app with 3 replicas image nginx:alpine. Ensure the StatefulSet uses the dns-app service for network identity. Verify that pods have stable DNS names like dns-app-0.dns-app

Use kubectl exec -it netshoot -n dev -- nslookup dns-app-0.dns-app to validate resolution.

📹 Video Solution

Property	Value
Namespace	`disco`
Deployment	`discovery-app` (3 replicas)
Service	`discovery-svc`
Pod label	`app=discovery`

Property	Value
Registry	ghcr.io
Repository	prepare-sh/alpine
Version	3.23.2
Architecture	amd64
Size	8.44 MB
Username	preparesh-bot
Access Token	`<provided-in-terminal>`

Multi Tenant Namespace Isolation

Kubernetes · Palantir · mid

Scenario

Two teams share a cluster and require strict isolation with specific exceptions for inter-team communication.

Task

Configure network isolation and resource constraints for both teams:

Create default deny NetworkPolicies for both namespaces (deny all ingress and egress traffic)
Create a NetworkPolicy allowing team-a pods to access team-b pods labeled app=api on port 8080 only
Create LimitRanges in both namespaces to enforce maximum resource limits per container

Property	Value
Namespace 1	`team-a`
Namespace 2	`team-b`
Allowed communication	`team-a` → `team-b` pods with label `app=api` on port `8080` only
Default traffic	Deny all other cross-namespace traffic
Max CPU per container	`1`
Max Memory per container	`512Mi`

Note: Test pods are already deployed - client in team-a, and api + web in team-b.

OOMKilled Pod Analysis Fix
Kubernetes · Accenture · mid

More

Scenario

A memory-intensive application named oom-demo is repeatedly crashing. The pod status shows CrashLoopBackOff, but you need to confirm the underlying cause is an "Out Of Memory" (OOM) error and fix it by increasing the memory limit.

Task

Inspect the existing pod oom-demo in the apps namespace. Confirm the termination reason is OOMKilled. Update the pod definition to increase the memory limit from 20Mi to 100Mi. Verify the pod stabilizes and enters the Running state.

📹 Video Solution

Secure Internal Service Communication

Kubernetes · Dropbox · mid

Scenario

An application requires TLS certificates for internal service communication.

Task

Setup cert-manager to issue a valid TLS certificate using a SelfSigned ClusterIssuer to bootstrap a CA Issuer.

Property	Value
Namespace	`preparesh`
SelfSigned ClusterIssuer	`selfsigned-issuer`
CA Certificate name	`ca-cert`
CA secret name	`ca-secret`
CA Issuer name	`ca-issuer`
CA Issuer CN	`preparesh-ca`
Certificate name	`web-cert`
Certificate secret	`web-cert-tls`
DNS names	`web.preparesh.svc`, `web.preparesh.svc.cluster.local`

Template files available at /home/interview/.

📹 Video Solution

StorageClass and PVC Expansion
Kubernetes · Datadog · mid

More

Scenario

The cluster has no StorageClass configured for dynamic volume expansion.

Task

Create a StorageClass with volume expansion enabled, create a PVC and Pod using it, then expand the PVC from 1Gi to 2Gi.

Property Value

Namespace storage

StorageClass name fast-sc

PVC name expand-pvc

Initial PVC size 1Gi

Expanded PVC size 2Gi

Pod name storage-pod

Pod image nginx

Mount path /data

Resource templates are available at /home/interview/.

📹 Video Solution
Traffic Splitting With Native Kubernetes
Kubernetes · Palantir · mid

More

Scenario

Your team has a stable deployment app-v1 running in namespace canary. They want to test a new version (v2) with approximately 1/3 of traffic going to v2 and 2/3 remaining on v1.

Task

Implement canary traffic splitting between both versions using only native Kubernetes resources.

Property Value

Namespace canary

Existing deployment app-v1

Canary deployment app-v2

Canary image nginx:1.25

Service name my-app-svc

Service port 80

A reference deployment file is available at /home/interview/deployment.yaml.

📹 Video Solution

Property	Value
Namespace	`storage`
StorageClass name	`fast-sc`
PVC name	`expand-pvc`
Initial PVC size	`1Gi`
Expanded PVC size	`2Gi`
Pod name	`storage-pod`
Pod image	`nginx`
Mount path	`/data`

Property	Value
Namespace	`canary`
Existing deployment	`app-v1`
Canary deployment	`app-v2`
Canary image	`nginx:1.25`
Service name	`my-app-svc`
Service port	`80`

Linux 10 questions

Analyzing Log Partition Usage
Linux · RedHat · mid
More
Scenario

Log rotation has stopped working correctly, and you suspect that /var/log might be mounted on a different filesystem with limited space or incorrect mount options.

Task

Determine which filesystem or device /var/log is mounted on, including device name, mount point, filesystem type, size, and usage. Save the findings to /home/devops/varlog_filesystem_info.txt Optionally identify if this filesystem differs from / which could provide additional info on causes of log rotation or space issues.

Example
```
# After (filesystem identified and analyzed)

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1       10G   9.8G  200M  98% /var/log

Mount details:
/dev/sdb1 on /var/log type ext4 (rw,relatime)
```
📹 Video Solution
Debug SSH Lockout
Linux · TCS · mid
More
Scenario

The developer account dev has been locked out of the server. Security logs indicate the SSH daemon's authentication failure limit was triggered.

Task

Check the logs to count exactly how many times the user failed today. Update the SSH configuration to increase the allowed login attempts above that number.

Example
```
root@server:~# tail /var/log/auth.log
Dec 23 09:12:01 server sshd[2201]: Failed password for dev from 10.0.0.5 port 5432 ssh2
Dec 23 09:12:05 server sshd[2201]: Failed password for dev from 10.0.0.5 port 5432 ssh2
Dec 23 09:12:08 server sshd[2201]: Failed password for dev from 10.0.0.5 port 5432 ssh2
```
📹 Video Solution
Detect Memory Leak by Monitoring RSS
Linux · Google · mid

More

Scenario

One of your long-running Node.Js services (process name node) has been slowing down over several hours of uptime. CPU usage is normal, disk I/O is normal, but the server is gradually running out of memory.

Task

Identify if any process is leaking memory and kill that process.

📹 Video Solution
Discover Unexpected Background Jobs
Linux · Plus500 · mid
More
Scenario

You have noticed an unexpected spike in system load. You suspect a batch of recently spawned jobs is responsible and need to isolate processes that started within the last few minutes.

Task

Identify all processes started within the last 10 minutes and save their PID, User, Start Date (lstart), and Command to /home/devops/recent_processes.txt. Once recent processes written to the file Terminate the Suspicious processes (i.e any process that doesn't belong to the root user or systemd).

Example

The file /home/devops/recent_processes.txt should contain the list of recently started processes:
```
  PID USER     STARTED                     CMD
 8234 deploy   Mon Oct 29 16:37:12 2025    /opt/scripts/deploy.sh
 8235 deploy   Mon Oct 29 16:37:15 2025    bash ./worker_start.sh
```
Fix Inode Exhaustion Issue
Linux · DeutscheBank · mid
More
Scenario:

Your server cannot create new files. Commands like touch fail with "No space left on device" errors, but df -h shows plenty of free disk space. The filesystem has exhausted available inodes.

Task:

Save inode usage to /home/interview/inode_usage.txt, find which directory contains excessive files, save the problematic directory path to /home/interview/problem_directory.txt, clean up the files, and verify the fix.

Example

File: /home/interview/inode_usage.txt
```
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1            6553600 6553600       0  100% /
/dev/sdb1            3276800  125000 3151800    4% /data
```
File: /home/interview/problem_directory.txt
```
/var/spool/postfix/maildrop
```
📹 Video Solution
Monitoring Process Ownership
Linux · HashiCorp · mid
More
Scenario

The server is consuming excessive resources. This server is used by multiple teams with their own credentials (e.g. each team has a username dev-team, qa-team, ops-team etc).

Task

Identify which user is running the most number of processes (count) on the server, regardless of CPU or memory usage, and write that username to: /home/devops/solution.txt

Example

A single username written to the expected output file.
```
<username>
```
📹 Video Solution
Real-Time Log Timestamping
Linux · Adobe · mid
More
Scenario

You're troubleshooting a service that produces untagged log output when run manually, making it difficult to analyze timing and sequence of events.

Task

Create a command that reads from standard input line by line and appends the current timestamp to the end of each line as it's read. Test it interactively by piping output to verify it works, then save the solution to a shell script at /usr/local/bin/timestamp.sh and make it executable so it can be used in any pipeline.

Example
```
# Before (untagged log output)

Application started
Processing request #1234
Database connection established
Request completed
```
```
# After (timestamped in real-time)

Application started - 2025-11-06 15:30:45
Processing request #1234 - 2025-11-06 15:30:46
Database connection established - 2025-11-06 15:30:47
Request completed - 2025-11-06 15:30:48
```
📹 Video Solution
Update AWS Configs
Linux · Stripe · mid
More
Scenario

Each application environment (staging, dev, prod) has its own configuration file stored under /etc/app/envs/, and each file currently has multi_az = false and availability_zone = "us-east-1a". Manually editing each file is error-prone and inefficient, so the change must be automated.

Task

Locate all .conf files under /etc/app/envs/ across different environment subdirectories, update the multi_az setting from false to true, modify the availability_zone line to include two zones "us-east-1a,us-east-1b", perform these edits in-place while preserving all other configuration values.

Example
```
# Before (single-AZ configuration)

region = "us-east-1"
availability_zone = "us-east-1a"
multi_az = false
```
```
# After (multi-AZ configuration enabled)

region = "us-east-1"
availability_zone = "us-east-1a,us-east-1b"
multi_az = true
```
📹 Video Solution
Upload Safe File Partitioning
Linux · GoDaddy · mid
More
Scenario

Your application uploads files from /tmp/app/, but the maximum allowed file size is 1 MB, and some files exceed this limit.

Task

Find all files larger than 1 MB in /tmp/app/ and its subdirectories, split each oversized file into 1 MB chunks in the same directory where the original file is located with a recognizable naming pattern (e.g., original_filename.part_aa), keep the original files intact, and verify that the chunks were created successfully.

Example
```
# Before (files exceed 1 MB limit)

/tmp/app/uploads/video.mp4 (3.2 MB)
/tmp/app/data/archive.tar.gz (2.5 MB)

Cannot upload due to size restrictions
```
```
# After (files split into 1 MB chunks)

/tmp/app/uploads/video.mp4
/tmp/app/uploads/video.mp4.part_aa
/tmp/app/uploads/video.mp4.part_ab
/tmp/app/uploads/video.mp4.part_ac

/tmp/app/data/archive.tar.gz
/tmp/app/data/archive.tar.gz.part_aa
/tmp/app/data/archive.tar.gz.part_ab

Chunks ready for upload within size limits
```
📹 Video Solution
Using Unmounted Partitions
Linux · RedHat · mid
More
Scenario

The server has unmounted partitions that are not being used and could be utilized for additional storage.

Task

Identify unmounted partitions that are safe to use (avoiding system-critical partitions like /, /boot, /boot/efi, or swap), create an ext4 filesystem on one with a label data_extra, mount it at /mnt/test, and verify it's accessible.

Example
```
# Before (unmounted partition unused)
Block devices scanned, unmounted partitions found
loop0p2: 20GB unmounted, no filesystem
```
```
# After (partition formatted and mounted)
Filesystem created: ext4 with label=data_extra on /dev/loop0p2
Mounted at: /mnt/test
/dev/loop0p2 on /mnt/test type ext4 (rw,relatime)
Partition ready for use
```
📹 Video Solution

Networking 4 questions

Fix Port Exhaustion for High Speed Scraper
Networking · X · mid

More

Scenario

A web-scraper systemd service is running on this system, making continuous HTTP requests. The service has started experiencing connection failures - logs show HTTP status 000 errors, indicating connections cannot be established even though the network is functional and remote servers are accessible.

Task

Check the service status and logs to understand the failures, investigate the underlying cause, identify which system resource is exhausted, apply the appropriate kernel configuration change.

Once fix is applied you can test the connection with curl -o /dev/null -s -w "HTTP Status: %{http_code}\n" http://example.com

📹 Video Solution
Forward Traffic Between Ports
Networking · Meta · mid
More
Scenario

A service on your server is running on port 8080, but you now need it to also be reachable on port 8081. The application cannot be restarted and its configuration cannot be changed.

Task

Verify the service is listening on 127.0.0.1:8080. Configure iptables to forward all TCP traffic from port 8081 to port 8080, ensuring this works for both external requests and local connections (localhost). Verify the forwarding works, then save the rules to persist after a reboot using iptables-save

Example
```
tcp        0      0 127.0.0.1:8080          0.0.0.0:*               LISTEN      1234/java

PREROUTING (policy ACCEPT)
REDIRECT   tcp  --  0.0.0.0/0   0.0.0.0/0   tcp dpt:8081 redir ports 8080

OUTPUT (policy ACCEPT)
REDIRECT   tcp  --  0.0.0.0/0   127.0.0.1   tcp dpt:8081 redir ports 8080
```
📹 Video Solution
Inspecting HTTP Traffic Flow
Networking · Airbnb · mid
More
Scenario

You suspect the web service isn't receiving HTTP requests, and you need to confirm network traffic to port 80.

Task

Capture network packets destined for or originating from port 80 (HTTP traffic), limit the capture to the first 10 packets to avoid large files, save the captured packets to /tmp/http_traffic.pcap in pcap format, read the capture file and extract key information (source IP, destination IP with port, TCP flags), create a human-readable summary showing packet flow and TCP handshake details, and save the summary to /tmp/http_summary.txt in the format SOURCE_IP -> DEST_IP:PORT [FLAGS].

You may use tcpdump to capture and inspect packets.

Important

You can run script below to save http_summary.txt instead of manually filling the file since main goal is test troubleshooting.
```
cat <http_traffic_file> | awk '/Flags/ && /IP/ {
    # Skip IPv6 packets, only process IPv4
    if ($0 ~ /IP6/) next

    # Extract source and destination IPs WITH ports
    # Pattern: IP source.port > dest.port
    if (match($0, /IP ([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\.([0-9]+) > ([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\.([0-9]+)/, parts)) {
        src_ip = parts[1]
        src_port = parts[2]
        dst_ip = parts[3]
        dst_port = parts[4]

        # Extract flags
        if (match($0, /Flags (\[[^\]]+\])/, flags)) {
            print src_ip ":" src_port " -> " dst_ip ":" dst_port " " flags[1]
        }
    }
}' > /tmp/http_summary.txt
```
📹 Video Solution
Validating Network Routes
Networking · Google · mid
More
Scenario

Your server uses multiple network interfaces and may have incorrect routing for a specific subnet. You need to verify and fix it to ensure proper traffic flow.

Task

Display the current routing table to identify existing routes, check if a route for the 10.10.0.0/16 subnet exists and which interface and gateway it uses. If the route goes through eth0, delete the existing route and add a new route for 10.10.0.0/16 via gateway 192.168.100.1 using interface eth1.

Example
```
# Before (incorrect route through eth0)

10.10.0.0/16 routed via eth0
Gateway: 192.168.50.1
Traffic experiencing packet loss
```
```
# After (corrected route through eth1)

10.10.0.0/16 routed via eth1
Gateway: 192.168.100.1
Route verified and active
```
📹 Video Solution

Security 1 question

Fix HTTPS Certificate Error
Security · Github · mid
More
Scenario

A minimal HTTPS webserver script (webserver.sh) listening on port 8443 fails to establish secure connections. The bundled SSL certificate (old_server.crt) lacks a Subject Alternative Name (SAN) for the local IP 127.0.0.1, causing hostname verification failures.

Task

Run the broken server and inspect the certificate to confirm the missing SAN. Generate a new self-signed certificate with SAN set to IP:127.0.0.1 and save it as server.crt and server.key. Update webserver.sh to use the new certificate files, launch the fixed server, and verify connectivity.

Example
```
# Before (Missing SAN)
subject=CN = wrong.example.com
curl: (60) SSL: no alternative certificate subject name matches target host name '1.1.1.1'

# After (SAN Present)
subject=CN = example
X509v3 Subject Alternative Name: IP Address:1.1.1.1
Hello World
```
📹 Video Solution

Mid questions

Scenario

Task

Scenario

Task

Scenario

Task

Scenario

Task

Scenario

Task

Scenario

Task

Scenario

Task

Scenario

Task

Scenario

Task

Scenario

Task

Scenario:

Task:

Scenario

Task

Example

Scenario

Task

Example

Scenario

Task

Example

Scenario:

Task:

Scenario:

Task:

Scenario:

Task:

Example:

Scenario:

Example:

Scenario:

Task:

Example:

Scenario

Task

Example

Scenario:

Task:

Scenario

Task

Example

Scenario

Task

Scenario:

Task:

Scenario

Task

Scenario

Task

Scenario

Task

Example

Scenario

Task

Scenario

Task

Scenario

Task

Example

Scenario

Task

Image Information

Scenario

Task

Scenario

Task

Scenario

Task

Scenario