Definition
API Discovery is the continuous process of identifying all API endpoints that exist in an organization’s infrastructure, whether documented or not. It answers the fundamental question: “What APIs do we actually have running in production?”
Discovery is critical because organizations rarely have accurate inventories of their APIs. According to industry research, 50% of organizations don’t know how many APIs they have in production. This gap between documented APIs and actual deployed APIs creates security blind spots, compliance risks, and architectural confusion.
API Discovery uses three complementary approaches: traffic analysis (observing network requests), code scanning (analyzing source code and infrastructure), and infrastructure scanning (examining deployed resources). Together, these techniques reveal Shadow APIs, Zombie APIs, and forgotten endpoints that bypass official documentation.
Example
The Healthcare Platform Audit
A healthcare company believed they had 127 documented APIs. During an API Discovery exercise:
Traffic Analysis: API Gateway logs revealed 189 unique endpoints receiving requests Code Scanning: GitHub repository analysis found 214 API routes defined in code Infrastructure Scanning: Kubernetes and Lambda scanning discovered 231 deployed functions
Result: They actually had 231 APIs, including:
- 47 Shadow APIs (never documented)
- 23 Zombie APIs (documented but owners left)
- 34 debug/internal endpoints accidentally exposed
Without API Discovery, these 104 undocumented APIs represented unknown attack surface and compliance violations (processing patient data without HIPAA logging).
Analogy
The Archaeological Survey: Imagine surveying an ancient city. You have old maps (documentation), but you don’t know if they’re accurate. So you use three techniques:
- Aerial photography (traffic analysis) - See which roads actually have footprints
- Ground radar (code scanning) - Detect buried foundations of buildings
- Physical excavation (infrastructure) - Dig up what’s actually there
You discover streets not on the maps, buildings the maps claim exist but don’t, and ruins nobody knew about. API Discovery does the same for your infrastructure.
The Census: Governments don’t trust self-reported population data—they conduct actual censuses, going door-to-door counting residents. API Discovery is the census for your infrastructure, actively counting what exists rather than trusting documentation.
Code Example
1. Traffic Analysis Approach:
# Analyze API Gateway logs to find all accessed endpoints
cat api-gateway.log | grep "HTTP/1.1" | awk '{print $7}' | sort -u > actual-endpoints.txt
# Compare with documented APIs
comm -23 actual-endpoints.txt documented-endpoints.txt > shadow-apis.txt
# Result: 47 endpoints in shadow-apis.txt not in documentation
2. Code Scanning Approach:
# Find all Express route definitions in codebase
grep -r "app\.get\|app\.post\|app\.put\|app\.delete" --include="*.js" . \
| grep -oP '["'\'']\/api\/[^"'\'']+' \
| sort -u > code-endpoints.txt
# Find FastAPI route decorators
grep -r "@app\.\(get\|post\|put\|delete\)" --include="*.py" . \
| grep -oP '["'\'']\/api\/[^"'\'']+' \
| sort -u >> code-endpoints.txt
# Compare with documented endpoints
comm -23 code-endpoints.txt documented-endpoints.txt > undocumented-in-code.txt
3. Infrastructure Scanning Approach:
# Scan Kubernetes services for exposed ports
kubectl get services -A -o json \
| jq -r '.items[] | select(.spec.type=="LoadBalancer" or .spec.type=="NodePort")
| "\(.metadata.name):\(.spec.ports[].port)"' > k8s-exposed.txt
# Scan AWS Lambda functions
aws lambda list-functions \
| jq -r '.Functions[] | select(.Environment.Variables.API_ENDPOINT)
| .FunctionName' > lambda-apis.txt
# Scan Google Cloud Run services
gcloud run services list --format="value(name,url)" > cloudrun-apis.txt
Diagram
graph TB
A[API Discovery] --> B[Traffic Analysis]
A --> C[Code Scanning]
A --> D[Infrastructure Scanning]
B --> B1[API Gateway Logs]
B --> B2[Load Balancer Logs]
B --> B3[WAF Logs]
C --> C1[Git Repositories]
C --> C2[Source Code]
C --> C3[IaC Templates]
D --> D1[Kubernetes]
D --> D2[Lambda/Cloud Functions]
D --> D3[Cloud Services]
B1 --> E[Consolidated Inventory]
B2 --> E
B3 --> E
C1 --> E
C2 --> E
C3 --> E
D1 --> E
D2 --> E
D3 --> E
E --> F[Compare with Documentation]
F --> G[Find Shadow APIs]
F --> H[Find Zombie APIs]
F --> I[Find Orphaned APIs]
style A fill:#bbdefb
style E fill:#c8e6c9
style F fill:#fff9c4
style G fill:#ffcdd2
style H fill:#ffcdd2
style I fill:#ffcdd2
Security Notes
CRITICAL - API Discovery is the foundation of API security. You cannot secure what you don’t know exists.
Why Discovery Matters for Security:
- Attack surface mapping: Understand all entry points attackers can target
- Vulnerability management: You can’t patch endpoints you don’t know about
- Compliance auditing: GDPR, HIPAA, SOC2 require knowing what data APIs process
- Incident response: During breaches, you need complete API inventory immediately
- Zero Trust implementation: “Trust nothing” requires knowing everything that exists
Discovery Reveals:
- APIs with no authentication (immediate critical risk)
- APIs processing PII without proper logging (compliance violation)
- Deprecated APIs still receiving traffic (potential vulnerabilities)
- Third-party SDKs making unauthorized external calls (data leakage)
Best Practice: Run API Discovery continuously (daily or weekly), not just during audits. APIs appear constantly in modern CI/CD environments.
Discovery Approaches Compared
| Approach | Pros | Cons | Best For |
|---|---|---|---|
| Traffic Analysis | Sees actual usage, finds exposed endpoints | Misses unused APIs, requires log access | Production systems with good logging |
| Code Scanning | Finds APIs before deployment | Misses runtime-generated endpoints | Organizations with access to all repos |
| Infrastructure | Complete deployment view | Cloud-specific tools needed | Multi-cloud environments |
| Combined | Most comprehensive | Most complex to implement | Security-critical organizations |
Common Mistakes
Mistake 1: One-time discovery APIs are created daily in modern environments. A one-time audit finds what existed that day. Discovery must be continuous to be effective.
Mistake 2: Relying on a single method Traffic analysis misses unused APIs. Code scanning misses dynamically-generated routes. Infrastructure scanning misses APIs behind proxies. Use all three approaches.
Mistake 3: No follow-up process Discovering 50 Shadow APIs is useless without a triage process: Which are high-risk? Who owns them? What’s the remediation plan?
Mistake 4: Assuming developers will self-report “Please register your APIs in the catalog” doesn’t work. Discovery must be automated and enforced, not voluntary.
Discovery Tools
Open Source:
- Akto: Traffic analysis and API discovery from logs
- OWASP ZAP: Active scanning to discover endpoints
- Nuclei: Template-based scanning for API endpoints
- Spectral: Code scanning for API definitions and secrets
Enterprise:
- Salt Security: Continuous API discovery via traffic analysis
- 42Crunch: API security platform with discovery features
- Traceable AI: ML-powered API discovery and risk assessment
- Noname Security: Real-time API discovery and inventory
Cloud-Native:
- Azure API Center: Microsoft’s API inventory and governance
- AWS API Gateway: Built-in logging and analytics
- Google Apigee: API management with discovery features
DIY Approach:
- Log aggregation: ELK Stack, Splunk, Datadog for traffic analysis
- Code scanning: Custom scripts with grep/ripgrep
- IaC scanning: Parse Terraform, CloudFormation, Kubernetes YAML
Implementation Roadmap
Phase 1: Baseline Discovery (Week 1-2)
- Export documented APIs from current tools (Swagger, Postman)
- Run traffic analysis on API Gateway logs (last 30 days)
- Scan primary code repositories for route definitions
- Document findings: X documented, Y discovered, Z gap
Phase 2: Infrastructure Mapping (Week 3-4)
- Scan Kubernetes clusters for exposed services
- List all Lambda/Cloud Functions with HTTP triggers
- Audit cloud load balancers and API gateways
- Cross-reference with Phase 1 findings
Phase 3: Automation (Week 5-8)
- Set up automated log analysis (daily runs)
- Integrate code scanning into CI/CD pipeline
- Schedule infrastructure scans (weekly)
- Create dashboard showing: documented vs discovered APIs
Phase 4: Governance (Month 3+)
- Establish triage process for newly-discovered APIs
- Assign owners to all APIs (including Shadow APIs)
- Implement API registration requirements in CI/CD
- Monitor for new Shadow APIs continuously
Best Practices
- Automate everything: Manual discovery doesn’t scale and becomes outdated immediately
- Combine approaches: Traffic + Code + Infrastructure gives complete picture
- Run continuously: Daily or weekly, not quarterly
- Triage by risk: Not all Shadow APIs are equal—prioritize by data sensitivity and traffic
- Fix the root cause: Discovering Shadow APIs is step 1; preventing new ones is step 2
- Document everything: Maintain living inventory, not static reports
Assessment Tool
Start Here: Use the Shadow API Detection Tool to:
- Assess your organization’s API discovery maturity
- Get recommended discovery approaches for your architecture
- Estimate effort required for complete API inventory
- Identify quick wins for initial discovery project
Further Reading
📚 Book: “Cómo Identificar Shadow APIs Con Herramientas Open Source”
Comprehensive guide to API Discovery with practical implementation details:
Key chapters:
- Chapter 3: Traffic Analysis for API Discovery (Akto, custom scripts)
- Chapter 4: Code Scanning Strategies (GitHub, GitLab, Bitbucket)
- Chapter 5: Infrastructure Scanning (Kubernetes, AWS, Azure, GCP)
- Chapter 7: Building an Automated Discovery Pipeline
- Chapter 9: Tool Comparison Matrix (20+ tools evaluated)