Skip to Content
Internal ScanningTroubleshooting

Troubleshooting & Operations

Monitor your scanner deployment, perform maintenance tasks, and resolve common issues.

Monitoring

Health Checks

Monitor scanner health from within your network:

# API health curl https://scanner.internal.example.com/health # Detailed status curl https://scanner.internal.example.com/status

Viewing Logs

# Scan Scheduler logs kubectl logs -n scanner -l app=scan-scheduler -f # Scan Manager logs kubectl logs -n scanner -l app=scan-manager -f # All scanner logs kubectl logs -n scanner -l app.kubernetes.io/part-of=scanner -f

Resource Usage

Monitor resource consumption:

kubectl top pods -n scanner

Prometheus (Optional)

If Prometheus is enabled in your deployment:

kubectl port-forward -n monitoring svc/prometheus 9090:9090 open http://localhost:9090

Cloud-Specific Monitoring

For cloud-specific monitoring options, see your deployment guide:

Maintenance

Updating Scanner Version

Scanner updates are managed through your Terraform configuration. When a new version is available, update your module version and apply:

terraform init -upgrade terraform apply

The Helm chart performs a rolling update with zero downtime.

Restarting Components

# Restart all scanner components kubectl rollout restart deployment -n scanner # Restart specific component kubectl rollout restart deployment/scan-scheduler -n scanner

Deployment Issues

First Terraform Apply Fails on Kubernetes/Helm Resources

Symptoms: First terraform apply creates the EKS cluster but fails with authentication or timeout errors on kubernetes_* or helm_* resources.

Cause: The Kubernetes and Helm providers need the EKS cluster endpoint to authenticate, but the cluster doesn’t fully exist until Terraform creates it. The try() wrappers in providers.tf allow the first apply to create the cluster, but Kubernetes resources can fail if IAM access entries haven’t propagated yet. This does not happen in all environments.

Solution: Run terraform apply a second time. The cluster is now fully provisioned and the providers can connect.

Terraform Hangs on Kubernetes/Helm Resources

Symptoms: terraform apply hangs indefinitely (no progress for several minutes) on kubernetes_* or helm_release.* resources.

Cause: Terraform cannot reach the EKS API endpoint from your network.

Solution A — No VPN: Enable the public EKS API endpoint:

module "internal_scanner" { # ... cluster_endpoint_public_access = true cluster_endpoint_public_access_cidrs = ["your-ip/32"] # Restrict to your IP }

Solution B — With VPN: Add security group rules to allow Terraform access from your VPN network:

module "internal_scanner" { # ... cluster_security_group_additional_rules = { ingress_terraform = { description = "Allow Terraform access to EKS API" protocol = "tcp" from_port = 443 to_port = 443 type = "ingress" cidr_blocks = ["your-vpn-cidr/24"] } } }

See EKS API Access for details on both options.

ALB Not Created After Deployment

Symptoms: Scanner endpoint unreachable, no load balancer visible in AWS Console, ingress shows no address.

Diagnosis:

kubectl get ingress -n scanner kubectl logs -n kube-system -l app.kubernetes.io/name=aws-load-balancer-controller

Common causes:

  • Missing subnet tags: Private subnets must have kubernetes.io/role/internal-elb = 1. Add with:
    aws ec2 create-tags \ --resources subnet-aaaaa subnet-bbbbb \ --tags Key=kubernetes.io/role/internal-elb,Value=1
  • Missing IAM permissions for the ALB controller
  • Security group rules blocking the controller

Terraform Destroy Fails

Symptoms: terraform destroy errors out or hangs.

Cause: The ALB controller needs time to clean up AWS resources (load balancers, target groups) before being removed. The destroy ordering can cause conflicts.

Solution: Run terraform destroy again. The module includes cleanup waits, but they may not be sufficient. If persistent, manually delete the load balancers in the AWS Console, then retry.


Common Issues

Scanner Not Connecting to Detectify

Symptoms: Scanner shows as disconnected in the Detectify UI.

Steps to diagnose:

  1. Verify outbound internet access:

    kubectl exec -it -n scanner deploy/scan-scheduler -- curl -v https://api.detectify.com/health
  2. Check API token is configured correctly:

    kubectl get secret -n scanner scanner-config -o yaml
  3. Check scan-scheduler logs for connection errors:

    kubectl logs -n scanner -l app=scan-scheduler --tail=100

Scans Failing

Symptoms: Scans start but fail to complete or report errors.

Steps to diagnose:

  1. Check scan manager logs for errors:

    kubectl logs -n scanner -l app=scan-manager --tail=100
  2. Verify network connectivity to target application:

    kubectl exec -it -n scanner deploy/scan-manager -- curl -v https://target-app.internal
  3. Check if scan-worker pods are being created:

    kubectl get pods -n scanner -w

Pods Not Starting

Symptoms: Pods stuck in Pending or CrashLoopBackOff state.

Steps to diagnose:

  1. Check pod status and events:

    kubectl describe pod -n scanner <pod-name>
  2. View pod logs:

    kubectl logs -n scanner <pod-name>
  3. Check node resources:

    kubectl top nodes

Common causes:

  • Insufficient cluster resources (nodes need to scale up)
  • Image pull errors (check registry credentials)
  • Configuration errors (check secrets and configmaps)

High Resource Usage / OOMKilled

Symptoms: Pods being killed due to memory limits, slow performance.

Steps to diagnose:

  1. Monitor resource consumption:

    kubectl top pods -n scanner
  2. Check for OOMKilled events:

    kubectl get events -n scanner --field-selector reason=OOMKilled

Solution: Increase memory limits in your Terraform configuration or reduce concurrent scans. See Scaling for capacity planning guidance.

Image Pull Errors

Symptoms: Pods stuck with ImagePullBackOff or ErrImagePull status.

Steps to diagnose:

  1. Check pod events for details:

    kubectl describe pod -n scanner <pod-name>
  2. Verify container registry credentials are configured:

    kubectl get secret -n scanner regcred -o yaml

Solution: Verify your Docker credentials from the Detectify UI are correctly configured. Contact Detectify support if you’re unable to pull images.

Load Balancer Not Created

Symptoms: Scanner endpoint unreachable, no load balancer provisioned.

Steps to diagnose:

  1. Check ingress/service status:

    kubectl get svc -n scanner kubectl get ingress -n scanner
  2. Check cloud-specific load balancer controller logs (varies by provider)

Solution: See your cloud provider’s deployment guide for specific troubleshooting steps.

Getting Help

If you’re unable to resolve an issue:

  1. Collect diagnostic information:

    kubectl get pods -n scanner -o wide kubectl describe pods -n scanner kubectl logs -n scanner -l app.kubernetes.io/part-of=scanner --tail=200 kubectl get events -n scanner --sort-by='.lastTimestamp'
  2. Contact Detectify support with the diagnostic output and a description of the issue.

Last updated on