YAML

Fundamentals Security Notes Jan 6, 2025 YAML

Definition

If you’ve ever looked at a JSON configuration file and thought “there has to be a more readable way to write this,” you’ve discovered the problem YAML was designed to solve. YAML (which recursively stands for “YAML Ain’t Markup Language”) is a human-friendly way to write structured data that both humans and computers can understand.

What makes YAML special is its use of indentation to show relationships - just like how an outline organizes information. Instead of curly braces and brackets everywhere (like JSON), YAML uses whitespace and simple punctuation. This makes it much easier to read and write by hand, which is why it’s become the go-to format for configuration files, API specifications, and infrastructure-as-code tools.

You’ve probably encountered YAML without realizing it. GitHub Actions workflows? Written in YAML. Kubernetes deployment files? YAML. Docker Compose files? YAML. OpenAPI specifications? Usually YAML. When developers need to configure something complex but still want it to be human-readable, they almost always reach for YAML. It’s the difference between reading a well-organized outline versus trying to parse a wall of brackets and commas.

Example

CI/CD Pipelines: When you set up GitHub Actions to automatically test and deploy your code, you write the workflow in YAML. The file describes steps like “install dependencies, run tests, deploy to production” in a format that’s easy to read and modify.

Kubernetes Deployments: If you’re deploying containers to Kubernetes, you describe your entire infrastructure in YAML files. How many replicas? What image to use? Which ports to expose? All specified in readable YAML that you can version control alongside your code.

Docker Compose: When you need to run multiple containers together (like a web app with a database), docker-compose.yml defines the whole setup. “Run this image, connect to this network, mount this volume” - all in clear, structured YAML.

OpenAPI/Swagger Specifications: When you document an API, the spec is typically written in YAML. It describes endpoints, parameters, responses, and authentication in a format that tools can use to generate documentation, SDKs, and mock servers.

Application Configuration: Many applications use YAML for their config files because it’s easier for humans to edit than JSON. Rails applications, for example, use config/database.yml to configure database connections across environments.

Analogy

The Organized Outline: YAML is like writing an outline for a research paper. You use indentation to show what belongs under what:

  • Chapter 1: Introduction
    • What is the problem?
    • Why does it matter?
  • Chapter 2: Methods
    • Data collection
    • Analysis approach

Compare this to JSON, which would be like writing the same outline using only parentheses and commas to show structure. Both convey the same information, but the outline is much easier to scan and edit.

The Tax Form vs. The Conversation: JSON is like filling out a rigid tax form with specific boxes for each value. YAML is like explaining the same information conversationally: “My income was $50,000, I had $10,000 in deductions, and I’m claiming two dependents.” Both communicate the data, but one feels more natural.

The Recipe Card: A YAML file is like a well-organized recipe card:

  • Ingredients:
    • flour: 2 cups
    • sugar: 1 cup
  • Steps:
    • Mix dry ingredients
    • Add wet ingredients
    • Bake at 350F

It’s structured enough for a computer to parse, but formatted naturally enough that a human can follow it without any training.

Assembly Instructions: Think of IKEA furniture instructions. They use visual indentation and simple symbols to show which pieces go together and in what order. YAML does the same thing with text - using indentation and simple punctuation to show relationships between pieces of data, making it easy to follow even when the assembly (configuration) is complex.

Code Example


# OpenAPI specification in YAML
openapi: 3.0.0
info:
  title: User API
  version: 1.0.0
  description: API for managing users

paths:
  /users:
    get:
      summary: List all users
      responses:
        '200':
          description: Successful response
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: '#/components/schemas/User'

  /users/{id}:
    get:
      summary: Get user by ID
      parameters:
        - name: id
          in: path
          required: true
          schema:
            type: integer

components:
  schemas:
    User:
      type: object
      properties:
        id:
          type: integer
        name:
          type: string
        email:
          type: string
          format: email

Security Notes

SECURITY NOTES

CRITICAL: YAML is configuration language. Vulnerable to injection and XXE attacks.

YAML Vulnerabilities:

  • Code execution: YAML can execute arbitrary code
  • Object deserialization: Unsafe deserialization exploits
  • XXE attacks: External entity attacks
  • Injection: YAML injection in fields

Safe Parsing:

  • Safe loader: Use safe YAML parser (not eval)
  • No code execution: Disable code execution features
  • Validation: Validate YAML structure
  • Schema enforcement: Enforce YAML schema
  • Whitelist types: Only allow known types

Common Issues:

  • Unsafe deserialization: Using eval() on YAML
  • External references: Allowing external entities
  • Type coercion: Automatic type conversion issues
  • Comment confusion: Comments interpreted as code

Best Practices:

  • Use safe parsers: Use safe YAML libraries
  • Disable features: Disable unnecessary YAML features
  • Validate input: Validate YAML structure and types
  • Schema: Define and enforce YAML schema
  • Sanitize: Sanitize YAML before processing

Configuration Security:

  • Secrets: Don’t store secrets in YAML files
  • Environment variables: Use env vars for sensitive values
  • Encryption: Encrypt sensitive configuration
  • Access control: Restrict access to config files
  • Audit: Log configuration changes

Standards & RFCs

Standards & RFCs