Secrets of Secure Development in Drupal: Key Functions

By admin, 28 October, 2024

  public static function filter($string, ?array $allowed_html_tags = NULL) { 
    if (is_null($allowed_html_tags)) { 
      $allowed_html_tags = static::$htmlTags; 
    } 
    // Only operate on valid UTF-8 strings. This is necessary to prevent cross 
    // site scripting issues on Internet Explorer 6. 
    if (!Unicode::validateUtf8($string)) { 
      return ''; 
    } 
    // Remove NULL characters (ignored by some browsers). 
    $string = str_replace(chr(0), '', $string); 
    // Remove Netscape 4 JS entities. 
    $string = preg_replace('%&\s*\{[^}]*(\}\s*;?|$)%', '', $string); 

    // Defuse all HTML entities. 
    $string = str_replace('&', '&amp;', $string); 
    // Change back only well-formed entities in our list of allowed html tags: 
    // Decimal numeric entities. 
    $string = preg_replace('/&amp;#([0-9]+;)/', '&#\1', $string); 
    // Hexadecimal numeric entities. 
    $string = preg_replace('/&amp;#[Xx]0*((?:[0-9A-Fa-f]{2})+;)/', '&#x\1', $string); 
    // Named entities. 
    $string = preg_replace('/&amp;([A-Za-z][A-Za-z0-9]*;)/', '&\1', $string); 
    $allowed_html_tags = array_flip($allowed_html_tags); 
    // Late static binding does not work inside anonymous functions. 
    $class = static::class; 
    $splitter = function ($matches) use ($allowed_html_tags, $class) { 
      return $class::split($matches[1], $allowed_html_tags, $class); 
    }; 
    // Strip any tags that are not in the list of allowed html tags. 
    return preg_replace_callback('% 
      ( 
      <(?=[^a-zA-Z!/])  # a lone < 
      |                 # or 
      <!--.*?-->        # a comment 
      |                 # or 
      <[^>]*(>|$)       # a string that starts with a <, up until the > or the end of the string 
      |                 # or 
      >                 # just a > 
      )%x', $splitter, $string); 
  }

This function is a static method that filters an input string, $string, by removing potentially unsafe HTML and JavaScript content, as well as handling HTML entities, to protect against cross-site scripting (XSS) attacks.

Here’s a breakdown of what it does step-by-step:

1. Setting Allowed Tags:
- If $allowed_html_tags is not specified, it defaults to static::$htmlTags, which presumably contains a predefined list of allowed HTML tags.

2. UTF-8 Validation:
- Ensures $string is valid UTF-8 using Unicode::validateUtf8($string). If the string is invalid, it returns an empty string, preventing XSS vulnerabilities specific to UTF-8 encoding in some browsers (e.g., Internet Explorer 6).

3. Character and Script Removal:
- Removes NULL characters from the string to avoid any browser-specific issues with handling such characters.
- Strips out Netscape 4 JavaScript entities using a regex that targets patterns like &{...}.

4. Defusing HTML Entities:
- Converts all & characters to & to neutralize existing HTML entities, effectively disabling them temporarily.

5. Restoring Allowed Entities:
- Converts & back to & for specific entities:
- Decimal numeric entities (&#nnnn;).
- Hexadecimal numeric entities (&#xhhhh;).
- Named entities (like  ).

6. Filtering Tags:
- The $splitter function (a callback) is used to filter out tags that are not in the list of $allowed_html_tags. This function uses late static binding to call static::split, a method presumably defined in this class to manage how tags are split and processed based on allowed tags.

7. Final HTML Tag Filtering:
- A preg_replace_callback with a regex pattern removes any HTML tags not in $allowed_html_tags:
- It targets isolated < characters, HTML comments, and any tag that begins with < and ends with >, including incomplete tags or lone > characters.

The overall result is a sanitized $string that removes unapproved HTML tags and prevents JavaScript code execution, helping to safeguard against XSS vulnerabilities.

public static function escape($text): string {
  if (is_null($text)) {
    @trigger_error('Passing NULL to ' . __METHOD__ . ' is deprecated in drupal:9.5.0 and will trigger a PHP error from drupal:11.0.0. Pass a string instead. See https://www.drupal.org/node/3318826', E_USER_DEPRECATED);
    return '';
  }
  return htmlspecialchars($text, ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8');
}

The escape function is a static method that sanitizes input text to prevent XSS attacks by safely displaying HTML code on a page. Here’s a breakdown of how it works:

1. NULL Check:
- If the $text parameter is NULL, it triggers a deprecated warning with @trigger_error. This warning states that passing NULL to escape is deprecated in Drupal 9.5.0 and will raise a PHP error starting from Drupal 11.0.0. It advises passing a string instead. If NULL is provided, the function returns an empty string ''.

2. Safe Text Escaping:
- If $text is not NULL, it gets sanitized using the htmlspecialchars function, with the following key parameters:
- ENT_QUOTES | ENT_SUBSTITUTE – converts both single and double quotes into HTML entities and replaces invalid characters with safe substitutes if there’s invalid UTF-8.
- 'UTF-8' – specifies UTF-8 encoding, ensuring proper handling of multilingual text.

The function returns sanitized text where special HTML characters are converted to entities (e.g., < becomes <), allowing HTML to be safely displayed on the page without execution.

This function is essential for XSS protection, escaping user input before displaying it on the page.

public static function stripDangerousProtocols($uri) {
  $allowed_protocols = array_flip(static::$allowedProtocols);

  // Iteratively remove any invalid protocol found.
  do {
    $before = $uri;
    $colon_position = strpos($uri, ':');
    if ($colon_position > 0) {
      // We found a colon, possibly a protocol. Verify.
      $protocol = substr($uri, 0, $colon_position);
      // If a colon is preceded by a slash, question mark or hash, it cannot
      // possibly be part of the URL scheme. This must be a relative URL, which
      // inherits the (safe) protocol of the base document.
      if (preg_match('![/?#]!', $protocol)) {
        break;
      }
      // Check if this is a disallowed protocol. Per RFC2616, section 3.2.3
      // (URI Comparison) scheme comparison must be case-insensitive.
      if (!isset($allowed_protocols[strtolower($protocol)])) {
        $uri = substr($uri, $colon_position + 1);
      }
    }
  } while ($before != $uri);

  return $uri;
}

The stripDangerousProtocols function is a static method that removes potentially unsafe protocols (such as javascript: or data:) from a URI to prevent XSS attacks. Here’s a breakdown of its process:

1. Allowed Protocols:
- It flips the static::$allowedProtocols array to create $allowed_protocols, an associative array of approved protocols, making it easier to check against them.

2. Iterative Protocol Removal:
- The function repeatedly removes any disallowed protocols. This loop will continue until the URI no longer changes, meaning all unsafe protocols have been stripped.

3. Protocol Verification:
- Inside the loop, it looks for the first colon (:) in $uri, assuming this might mark the end of a protocol prefix.
- If a colon is found, it checks that the preceding characters don’t include /, ?, or #. If any of these characters precede the colon, the URI is treated as a relative URL, and the loop breaks.

4. Disallowed Protocol Check:
- If the protocol is not in the list of allowed protocols (checked case-insensitively), it is considered unsafe. The function removes it by setting $uri to the substring after the colon, thereby stripping the disallowed prefix.

5. Return:
- Once all invalid protocols are removed, the function returns the sanitized $uri.

This function is crucial for security, as it removes unsafe protocol prefixes from URIs, thereby mitigating XSS risks.

Secrets of Secure Development in Drupal: Key Functions

Tags

Comments

Restricted HTML