Troubleshooting & Operations
Monitor your scanner deployment, perform maintenance tasks, and resolve common issues.
Monitoring
Health Checks
Monitor scanner health from within your network:
# API health
curl https://scanner.internal.example.com/health
# Detailed status
curl https://scanner.internal.example.com/statusViewing Logs
# Scan Scheduler logs
kubectl logs -n scanner -l app=scan-scheduler -f
# Scan Manager logs
kubectl logs -n scanner -l app=scan-manager -f
# All scanner logs
kubectl logs -n scanner -l app.kubernetes.io/part-of=scanner -fResource Usage
Monitor resource consumption:
kubectl top pods -n scannerPrometheus (Optional)
If Prometheus is enabled in your deployment:
kubectl port-forward -n monitoring svc/prometheus 9090:9090
open http://localhost:9090Cloud-Specific Monitoring
For cloud-specific monitoring options, see your deployment guide:
Maintenance
Updating Scanner Version
Scanner updates are managed through your Terraform configuration. When a new version is available, update your module version and apply:
terraform init -upgrade
terraform applyThe Helm chart performs a rolling update with zero downtime.
Restarting Components
# Restart all scanner components
kubectl rollout restart deployment -n scanner
# Restart specific component
kubectl rollout restart deployment/scan-scheduler -n scannerDeployment Issues
First Terraform Apply Fails on Kubernetes/Helm Resources
Symptoms: First terraform apply creates the EKS cluster but fails with authentication or timeout errors on kubernetes_* or helm_* resources.
Cause: The Kubernetes and Helm providers need the EKS cluster endpoint to authenticate, but the cluster doesn’t fully exist until Terraform creates it. The try() wrappers in providers.tf allow the first apply to create the cluster, but Kubernetes resources can fail if IAM access entries haven’t propagated yet. This does not happen in all environments.
Solution: Run terraform apply a second time. The cluster is now fully provisioned and the providers can connect.
Terraform Hangs on Kubernetes/Helm Resources
Symptoms: terraform apply hangs indefinitely (no progress for several minutes) on kubernetes_* or helm_release.* resources.
Cause: Terraform cannot reach the EKS API endpoint from your network.
Solution A — No VPN: Enable the public EKS API endpoint:
module "internal_scanner" {
# ...
cluster_endpoint_public_access = true
cluster_endpoint_public_access_cidrs = ["your-ip/32"] # Restrict to your IP
}Solution B — With VPN: Add security group rules to allow Terraform access from your VPN network:
module "internal_scanner" {
# ...
cluster_security_group_additional_rules = {
ingress_terraform = {
description = "Allow Terraform access to EKS API"
protocol = "tcp"
from_port = 443
to_port = 443
type = "ingress"
cidr_blocks = ["your-vpn-cidr/24"]
}
}
}See EKS API Access for details on both options.
ALB Not Created After Deployment
Symptoms: Scanner endpoint unreachable, no load balancer visible in AWS Console, ingress shows no address.
Diagnosis:
kubectl get ingress -n scanner
kubectl logs -n kube-system -l app.kubernetes.io/name=aws-load-balancer-controllerCommon causes:
- Missing subnet tags: Private subnets must have
kubernetes.io/role/internal-elb = 1. Add with:aws ec2 create-tags \ --resources subnet-aaaaa subnet-bbbbb \ --tags Key=kubernetes.io/role/internal-elb,Value=1 - Missing IAM permissions for the ALB controller
- Security group rules blocking the controller
Terraform Destroy Fails
Symptoms: terraform destroy errors out or hangs.
Cause: The ALB controller needs time to clean up AWS resources (load balancers, target groups) before being removed. The destroy ordering can cause conflicts.
Solution: Run terraform destroy again. The module includes cleanup waits, but they may not be sufficient. If persistent, manually delete the load balancers in the AWS Console, then retry.
Common Issues
Scanner Not Connecting to Detectify
Symptoms: Scanner shows as disconnected in the Detectify UI.
Steps to diagnose:
-
Verify outbound internet access:
kubectl exec -it -n scanner deploy/scan-scheduler -- curl -v https://api.detectify.com/health -
Check API token is configured correctly:
kubectl get secret -n scanner scanner-config -o yaml -
Check scan-scheduler logs for connection errors:
kubectl logs -n scanner -l app=scan-scheduler --tail=100
Scans Failing
Symptoms: Scans start but fail to complete or report errors.
Steps to diagnose:
-
Check scan manager logs for errors:
kubectl logs -n scanner -l app=scan-manager --tail=100 -
Verify network connectivity to target application:
kubectl exec -it -n scanner deploy/scan-manager -- curl -v https://target-app.internal -
Check if scan-worker pods are being created:
kubectl get pods -n scanner -w
Pods Not Starting
Symptoms: Pods stuck in Pending or CrashLoopBackOff state.
Steps to diagnose:
-
Check pod status and events:
kubectl describe pod -n scanner <pod-name> -
View pod logs:
kubectl logs -n scanner <pod-name> -
Check node resources:
kubectl top nodes
Common causes:
- Insufficient cluster resources (nodes need to scale up)
- Image pull errors (check registry credentials)
- Configuration errors (check secrets and configmaps)
High Resource Usage / OOMKilled
Symptoms: Pods being killed due to memory limits, slow performance.
Steps to diagnose:
-
Monitor resource consumption:
kubectl top pods -n scanner -
Check for OOMKilled events:
kubectl get events -n scanner --field-selector reason=OOMKilled
Solution: Increase memory limits in your Terraform configuration or reduce concurrent scans. See Scaling for capacity planning guidance.
Image Pull Errors
Symptoms: Pods stuck with ImagePullBackOff or ErrImagePull status.
Steps to diagnose:
-
Check pod events for details:
kubectl describe pod -n scanner <pod-name> -
Verify container registry credentials are configured:
kubectl get secret -n scanner regcred -o yaml
Solution: Verify your Docker credentials from the Detectify UI are correctly configured. Contact Detectify support if you’re unable to pull images.
Load Balancer Not Created
Symptoms: Scanner endpoint unreachable, no load balancer provisioned.
Steps to diagnose:
-
Check ingress/service status:
kubectl get svc -n scanner kubectl get ingress -n scanner -
Check cloud-specific load balancer controller logs (varies by provider)
Solution: See your cloud provider’s deployment guide for specific troubleshooting steps.
Getting Help
If you’re unable to resolve an issue:
-
Collect diagnostic information:
kubectl get pods -n scanner -o wide kubectl describe pods -n scanner kubectl logs -n scanner -l app.kubernetes.io/part-of=scanner --tail=200 kubectl get events -n scanner --sort-by='.lastTimestamp' -
Contact Detectify support with the diagnostic output and a description of the issue.