API Discovery

Ai & Modern Apis Security Notes Jan 14, 2026 BASH
governance security inventory tooling

Definition

API Discovery is the continuous process of identifying all API endpoints that exist in an organization’s infrastructure, whether documented or not. It answers the fundamental question: “What APIs do we actually have running in production?”

Discovery is critical because organizations rarely have accurate inventories of their APIs. According to industry research, 50% of organizations don’t know how many APIs they have in production. This gap between documented APIs and actual deployed APIs creates security blind spots, compliance risks, and architectural confusion.

API Discovery uses three complementary approaches: traffic analysis (observing network requests), code scanning (analyzing source code and infrastructure), and infrastructure scanning (examining deployed resources). Together, these techniques reveal Shadow APIs, Zombie APIs, and forgotten endpoints that bypass official documentation.

Example

The Healthcare Platform Audit

A healthcare company believed they had 127 documented APIs. During an API Discovery exercise:

Traffic Analysis: API Gateway logs revealed 189 unique endpoints receiving requests Code Scanning: GitHub repository analysis found 214 API routes defined in code Infrastructure Scanning: Kubernetes and Lambda scanning discovered 231 deployed functions

Result: They actually had 231 APIs, including:

  • 47 Shadow APIs (never documented)
  • 23 Zombie APIs (documented but owners left)
  • 34 debug/internal endpoints accidentally exposed

Without API Discovery, these 104 undocumented APIs represented unknown attack surface and compliance violations (processing patient data without HIPAA logging).

Analogy

The Archaeological Survey: Imagine surveying an ancient city. You have old maps (documentation), but you don’t know if they’re accurate. So you use three techniques:

  1. Aerial photography (traffic analysis) - See which roads actually have footprints
  2. Ground radar (code scanning) - Detect buried foundations of buildings
  3. Physical excavation (infrastructure) - Dig up what’s actually there

You discover streets not on the maps, buildings the maps claim exist but don’t, and ruins nobody knew about. API Discovery does the same for your infrastructure.

The Census: Governments don’t trust self-reported population data—they conduct actual censuses, going door-to-door counting residents. API Discovery is the census for your infrastructure, actively counting what exists rather than trusting documentation.

Code Example

1. Traffic Analysis Approach:


# Analyze API Gateway logs to find all accessed endpoints
cat api-gateway.log | grep "HTTP/1.1" | awk '{print $7}' | sort -u > actual-endpoints.txt

# Compare with documented APIs
comm -23 actual-endpoints.txt documented-endpoints.txt > shadow-apis.txt

# Result: 47 endpoints in shadow-apis.txt not in documentation

2. Code Scanning Approach:


# Find all Express route definitions in codebase
grep -r "app\.get\|app\.post\|app\.put\|app\.delete" --include="*.js" . \
  | grep -oP '["'\'']\/api\/[^"'\'']+' \
  | sort -u > code-endpoints.txt

# Find FastAPI route decorators
grep -r "@app\.\(get\|post\|put\|delete\)" --include="*.py" . \
  | grep -oP '["'\'']\/api\/[^"'\'']+' \
  | sort -u >> code-endpoints.txt

# Compare with documented endpoints
comm -23 code-endpoints.txt documented-endpoints.txt > undocumented-in-code.txt

3. Infrastructure Scanning Approach:


# Scan Kubernetes services for exposed ports
kubectl get services -A -o json \
  | jq -r '.items[] | select(.spec.type=="LoadBalancer" or .spec.type=="NodePort")
  | "\(.metadata.name):\(.spec.ports[].port)"' > k8s-exposed.txt

# Scan AWS Lambda functions
aws lambda list-functions \
  | jq -r '.Functions[] | select(.Environment.Variables.API_ENDPOINT)
  | .FunctionName' > lambda-apis.txt

# Scan Google Cloud Run services
gcloud run services list --format="value(name,url)" > cloudrun-apis.txt

Diagram

graph TB
    A[API Discovery] --> B[Traffic Analysis]
    A --> C[Code Scanning]
    A --> D[Infrastructure Scanning]

    B --> B1[API Gateway Logs]
    B --> B2[Load Balancer Logs]
    B --> B3[WAF Logs]

    C --> C1[Git Repositories]
    C --> C2[Source Code]
    C --> C3[IaC Templates]

    D --> D1[Kubernetes]
    D --> D2[Lambda/Cloud Functions]
    D --> D3[Cloud Services]

    B1 --> E[Consolidated Inventory]
    B2 --> E
    B3 --> E
    C1 --> E
    C2 --> E
    C3 --> E
    D1 --> E
    D2 --> E
    D3 --> E

    E --> F[Compare with Documentation]
    F --> G[Find Shadow APIs]
    F --> H[Find Zombie APIs]
    F --> I[Find Orphaned APIs]

    style A fill:#bbdefb
    style E fill:#c8e6c9
    style F fill:#fff9c4
    style G fill:#ffcdd2
    style H fill:#ffcdd2
    style I fill:#ffcdd2

Security Notes

SECURITY NOTES

CRITICAL - API Discovery is the foundation of API security. You cannot secure what you don’t know exists.

Why Discovery Matters for Security:

  • Attack surface mapping: Understand all entry points attackers can target
  • Vulnerability management: You can’t patch endpoints you don’t know about
  • Compliance auditing: GDPR, HIPAA, SOC2 require knowing what data APIs process
  • Incident response: During breaches, you need complete API inventory immediately
  • Zero Trust implementation: “Trust nothing” requires knowing everything that exists

Discovery Reveals:

  • APIs with no authentication (immediate critical risk)
  • APIs processing PII without proper logging (compliance violation)
  • Deprecated APIs still receiving traffic (potential vulnerabilities)
  • Third-party SDKs making unauthorized external calls (data leakage)

Best Practice: Run API Discovery continuously (daily or weekly), not just during audits. APIs appear constantly in modern CI/CD environments.

Discovery Approaches Compared

ApproachProsConsBest For
Traffic AnalysisSees actual usage, finds exposed endpointsMisses unused APIs, requires log accessProduction systems with good logging
Code ScanningFinds APIs before deploymentMisses runtime-generated endpointsOrganizations with access to all repos
InfrastructureComplete deployment viewCloud-specific tools neededMulti-cloud environments
CombinedMost comprehensiveMost complex to implementSecurity-critical organizations

Common Mistakes

Mistake 1: One-time discovery APIs are created daily in modern environments. A one-time audit finds what existed that day. Discovery must be continuous to be effective.

Mistake 2: Relying on a single method Traffic analysis misses unused APIs. Code scanning misses dynamically-generated routes. Infrastructure scanning misses APIs behind proxies. Use all three approaches.

Mistake 3: No follow-up process Discovering 50 Shadow APIs is useless without a triage process: Which are high-risk? Who owns them? What’s the remediation plan?

Mistake 4: Assuming developers will self-report “Please register your APIs in the catalog” doesn’t work. Discovery must be automated and enforced, not voluntary.

Discovery Tools

Open Source:

  • Akto: Traffic analysis and API discovery from logs
  • OWASP ZAP: Active scanning to discover endpoints
  • Nuclei: Template-based scanning for API endpoints
  • Spectral: Code scanning for API definitions and secrets

Enterprise:

  • Salt Security: Continuous API discovery via traffic analysis
  • 42Crunch: API security platform with discovery features
  • Traceable AI: ML-powered API discovery and risk assessment
  • Noname Security: Real-time API discovery and inventory

Cloud-Native:

  • Azure API Center: Microsoft’s API inventory and governance
  • AWS API Gateway: Built-in logging and analytics
  • Google Apigee: API management with discovery features

DIY Approach:

  • Log aggregation: ELK Stack, Splunk, Datadog for traffic analysis
  • Code scanning: Custom scripts with grep/ripgrep
  • IaC scanning: Parse Terraform, CloudFormation, Kubernetes YAML

Implementation Roadmap

Phase 1: Baseline Discovery (Week 1-2)

  1. Export documented APIs from current tools (Swagger, Postman)
  2. Run traffic analysis on API Gateway logs (last 30 days)
  3. Scan primary code repositories for route definitions
  4. Document findings: X documented, Y discovered, Z gap

Phase 2: Infrastructure Mapping (Week 3-4)

  1. Scan Kubernetes clusters for exposed services
  2. List all Lambda/Cloud Functions with HTTP triggers
  3. Audit cloud load balancers and API gateways
  4. Cross-reference with Phase 1 findings

Phase 3: Automation (Week 5-8)

  1. Set up automated log analysis (daily runs)
  2. Integrate code scanning into CI/CD pipeline
  3. Schedule infrastructure scans (weekly)
  4. Create dashboard showing: documented vs discovered APIs

Phase 4: Governance (Month 3+)

  1. Establish triage process for newly-discovered APIs
  2. Assign owners to all APIs (including Shadow APIs)
  3. Implement API registration requirements in CI/CD
  4. Monitor for new Shadow APIs continuously

Best Practices

  1. Automate everything: Manual discovery doesn’t scale and becomes outdated immediately
  2. Combine approaches: Traffic + Code + Infrastructure gives complete picture
  3. Run continuously: Daily or weekly, not quarterly
  4. Triage by risk: Not all Shadow APIs are equal—prioritize by data sensitivity and traffic
  5. Fix the root cause: Discovering Shadow APIs is step 1; preventing new ones is step 2
  6. Document everything: Maintain living inventory, not static reports

Assessment Tool

Start Here: Use the Shadow API Detection Tool to:

  • Assess your organization’s API discovery maturity
  • Get recommended discovery approaches for your architecture
  • Estimate effort required for complete API inventory
  • Identify quick wins for initial discovery project

Further Reading

📚 Book: “Cómo Identificar Shadow APIs Con Herramientas Open Source”

Comprehensive guide to API Discovery with practical implementation details:

Key chapters:

  • Chapter 3: Traffic Analysis for API Discovery (Akto, custom scripts)
  • Chapter 4: Code Scanning Strategies (GitHub, GitLab, Bitbucket)
  • Chapter 5: Infrastructure Scanning (Kubernetes, AWS, Azure, GCP)
  • Chapter 7: Building an Automated Discovery Pipeline
  • Chapter 9: Tool Comparison Matrix (20+ tools evaluated)