API Discovery

Definition

API Discovery is the continuous process of identifying all API endpoints that exist in an organization’s infrastructure, whether documented or not. It answers the fundamental question: “What APIs do we actually have running in production?”

Discovery is critical because organizations rarely have accurate inventories of their APIs. According to industry research, 50% of organizations don’t know how many APIs they have in production. This gap between documented APIs and actual deployed APIs creates security blind spots, compliance risks, and architectural confusion.

API Discovery uses three complementary approaches: traffic analysis (observing network requests), code scanning (analyzing source code and infrastructure), and infrastructure scanning (examining deployed resources). Together, these techniques reveal Shadow APIs, Zombie APIs, and forgotten endpoints that bypass official documentation.

Example

The Healthcare Platform Audit

A healthcare company believed they had 127 documented APIs. During an API Discovery exercise:

Traffic Analysis: API Gateway logs revealed 189 unique endpoints receiving requests Code Scanning: GitHub repository analysis found 214 API routes defined in code Infrastructure Scanning: Kubernetes and Lambda scanning discovered 231 deployed functions

Result: They actually had 231 APIs, including:

47 Shadow APIs (never documented)
23 Zombie APIs (documented but owners left)
34 debug/internal endpoints accidentally exposed

Without API Discovery, these 104 undocumented APIs represented unknown attack surface and compliance violations (processing patient data without HIPAA logging).

Analogy

The Archaeological Survey: Imagine surveying an ancient city. You have old maps (documentation), but you don’t know if they’re accurate. So you use three techniques:

Aerial photography (traffic analysis) - See which roads actually have footprints
Ground radar (code scanning) - Detect buried foundations of buildings
Physical excavation (infrastructure) - Dig up what’s actually there

You discover streets not on the maps, buildings the maps claim exist but don’t, and ruins nobody knew about. API Discovery does the same for your infrastructure.

The Census: Governments don’t trust self-reported population data—they conduct actual censuses, going door-to-door counting residents. API Discovery is the census for your infrastructure, actively counting what exists rather than trusting documentation.

Code Example

1. Traffic Analysis Approach:


# Analyze API Gateway logs to find all accessed endpoints
cat api-gateway.log | grep "HTTP/1.1" | awk '{print $7}' | sort -u > actual-endpoints.txt

# Compare with documented APIs
comm -23 actual-endpoints.txt documented-endpoints.txt > shadow-apis.txt

# Result: 47 endpoints in shadow-apis.txt not in documentation

2. Code Scanning Approach:


# Find all Express route definitions in codebase
grep -r "app\.get\|app\.post\|app\.put\|app\.delete" --include="*.js" . \
  | grep -oP '["'\'']\/api\/[^"'\'']+' \
  | sort -u > code-endpoints.txt

# Find FastAPI route decorators
grep -r "@app\.\(get\|post\|put\|delete\)" --include="*.py" . \
  | grep -oP '["'\'']\/api\/[^"'\'']+' \
  | sort -u >> code-endpoints.txt

# Compare with documented endpoints
comm -23 code-endpoints.txt documented-endpoints.txt > undocumented-in-code.txt

3. Infrastructure Scanning Approach:


# Scan Kubernetes services for exposed ports
kubectl get services -A -o json \
  | jq -r '.items[] | select(.spec.type=="LoadBalancer" or .spec.type=="NodePort")
  | "\(.metadata.name):\(.spec.ports[].port)"' > k8s-exposed.txt

# Scan AWS Lambda functions
aws lambda list-functions \
  | jq -r '.Functions[] | select(.Environment.Variables.API_ENDPOINT)
  | .FunctionName' > lambda-apis.txt

# Scan Google Cloud Run services
gcloud run services list --format="value(name,url)" > cloudrun-apis.txt

Diagram

graph TB
    A[API Discovery] --> B[Traffic Analysis]
    A --> C[Code Scanning]
    A --> D[Infrastructure Scanning]

    B --> B1[API Gateway Logs]
    B --> B2[Load Balancer Logs]
    B --> B3[WAF Logs]

    C --> C1[Git Repositories]
    C --> C2[Source Code]
    C --> C3[IaC Templates]

    D --> D1[Kubernetes]
    D --> D2[Lambda/Cloud Functions]
    D --> D3[Cloud Services]

    B1 --> E[Consolidated Inventory]
    B2 --> E
    B3 --> E
    C1 --> E
    C2 --> E
    C3 --> E
    D1 --> E
    D2 --> E
    D3 --> E

    E --> F[Compare with Documentation]
    F --> G[Find Shadow APIs]
    F --> H[Find Zombie APIs]
    F --> I[Find Orphaned APIs]

    style A fill:#bbdefb
    style E fill:#c8e6c9
    style F fill:#fff9c4
    style G fill:#ffcdd2
    style H fill:#ffcdd2
    style I fill:#ffcdd2

Security Notes

SECURITY NOTES

CRITICAL - API Discovery is the foundation of API security. You cannot secure what you don’t know exists.

Why Discovery Matters for Security:

Attack surface mapping: Understand all entry points attackers can target
Vulnerability management: You can’t patch endpoints you don’t know about
Compliance auditing: GDPR, HIPAA, SOC2 require knowing what data APIs process
Incident response: During breaches, you need complete API inventory immediately
Zero Trust implementation: “Trust nothing” requires knowing everything that exists

Discovery Reveals:

APIs with no authentication (immediate critical risk)
APIs processing PII without proper logging (compliance violation)
Deprecated APIs still receiving traffic (potential vulnerabilities)
Third-party SDKs making unauthorized external calls (data leakage)

Best Practice: Run API Discovery continuously (daily or weekly), not just during audits. APIs appear constantly in modern CI/CD environments.

Discovery Approaches Compared

Approach	Pros	Cons	Best For
Traffic Analysis	Sees actual usage, finds exposed endpoints	Misses unused APIs, requires log access	Production systems with good logging
Code Scanning	Finds APIs before deployment	Misses runtime-generated endpoints	Organizations with access to all repos
Infrastructure	Complete deployment view	Cloud-specific tools needed	Multi-cloud environments
Combined	Most comprehensive	Most complex to implement	Security-critical organizations

Common Mistakes

Mistake 1: One-time discovery APIs are created daily in modern environments. A one-time audit finds what existed that day. Discovery must be continuous to be effective.

Mistake 2: Relying on a single method Traffic analysis misses unused APIs. Code scanning misses dynamically-generated routes. Infrastructure scanning misses APIs behind proxies. Use all three approaches.

Mistake 3: No follow-up process Discovering 50 Shadow APIs is useless without a triage process: Which are high-risk? Who owns them? What’s the remediation plan?

Mistake 4: Assuming developers will self-report “Please register your APIs in the catalog” doesn’t work. Discovery must be automated and enforced, not voluntary.

Discovery Tools

Open Source:

Akto: Traffic analysis and API discovery from logs
OWASP ZAP: Active scanning to discover endpoints
Nuclei: Template-based scanning for API endpoints
Spectral: Code scanning for API definitions and secrets

Enterprise:

Salt Security: Continuous API discovery via traffic analysis
42Crunch: API security platform with discovery features
Traceable AI: ML-powered API discovery and risk assessment
Noname Security: Real-time API discovery and inventory

Cloud-Native:

Azure API Center: Microsoft’s API inventory and governance
AWS API Gateway: Built-in logging and analytics
Google Apigee: API management with discovery features

DIY Approach:

Log aggregation: ELK Stack, Splunk, Datadog for traffic analysis
Code scanning: Custom scripts with grep/ripgrep
IaC scanning: Parse Terraform, CloudFormation, Kubernetes YAML

Implementation Roadmap

Phase 1: Baseline Discovery (Week 1-2)

Export documented APIs from current tools (Swagger, Postman)
Run traffic analysis on API Gateway logs (last 30 days)
Scan primary code repositories for route definitions
Document findings: X documented, Y discovered, Z gap

Phase 2: Infrastructure Mapping (Week 3-4)

Scan Kubernetes clusters for exposed services
List all Lambda/Cloud Functions with HTTP triggers
Audit cloud load balancers and API gateways
Cross-reference with Phase 1 findings

Phase 3: Automation (Week 5-8)

Set up automated log analysis (daily runs)
Integrate code scanning into CI/CD pipeline
Schedule infrastructure scans (weekly)
Create dashboard showing: documented vs discovered APIs

Phase 4: Governance (Month 3+)

Establish triage process for newly-discovered APIs
Assign owners to all APIs (including Shadow APIs)
Implement API registration requirements in CI/CD
Monitor for new Shadow APIs continuously

Best Practices

Automate everything: Manual discovery doesn’t scale and becomes outdated immediately
Combine approaches: Traffic + Code + Infrastructure gives complete picture
Run continuously: Daily or weekly, not quarterly
Triage by risk: Not all Shadow APIs are equal—prioritize by data sensitivity and traffic
Fix the root cause: Discovering Shadow APIs is step 1; preventing new ones is step 2
Document everything: Maintain living inventory, not static reports

Assessment Tool

Start Here: Use the Shadow API Detection Tool to:

Assess your organization’s API discovery maturity
Get recommended discovery approaches for your architecture
Estimate effort required for complete API inventory
Identify quick wins for initial discovery project