PHP URLencode VS RAWURLencode – What Is The Difference!?

Welcome to a quick tutorial on the difference between urlencode and rawurlencode in PHP. So what are these used for, and why are there two different versions of “encode URL”? Which one is correct and which should we use?

  • A URL string cannot contain white spaces and characters such as @+%. For example, http://site.com/my path/?email=jon@doe.com
  • To solve this problem, we do URL encoding to change the “illegal characters” into their equal “special characters”.
    • Use rawurlencode() to encode the URL path and file name – $url = "http://site.com/". rawurlencode("my path") . "/";
    • Use urlencode() to encode the query string – $q = "?email=". urlencode("jon@doe.com");

That covers the quick basics, but why is this necessary? Can’t we just do a “single universal encode” on both URL path and query string? Read on to find out!

ⓘ I have included a zip file with all the example source code at the start of this tutorial, so you don’t have to copy-paste everything… Or if you just want to dive straight in.

 

 

QUICK SLIDES

 

TABLE OF CONTENTS

Download & Notes PHP URL Encode Useful Bits & Links
The End

 

DOWNLOAD & NOTES

Firstly, here is the download link to the example code as promised.

 

EXAMPLE CODE DOWNLOAD

Click here to download all the example source code, I have released it under the MIT license, so feel free to build on top of it or use it in your own project.

 

QUICK NOTES

If you spot a bug, please feel free to comment below. I try to answer questions too, but it is one person versus the entire world… If you need answers urgently, please check out my list of websites to get help with programming.

 

 

PHP URL ENCODING

All right, let us now get into the examples and explanation of URL encoding in PHP.

 

WHY ARE THERE TWO “URL ENCODE” FUNCTIONS?

Before some of you guys start to blame “PHP is so stupid” – Nope, this has nothing to do with the PHP developers. The problem lies in the history of the confusing cyber world.

  • The file path part of the URL works based on RFC 3986 (section 2 if you want the specifcs) – We use rawurlencode() to handle that.
  • While the query string part of the URL works based on RFC 1866 (section 8.2.1 to be exact) – We use urlencode() to handle that.

 

RAW URL ENCODE EXAMPLE

1-raw-url-encode.php
<?php
// (A) THE FUNKY URL
$url = "http://site.com/my path/my@+%pic.jpg";
 
// (B) BREAK URL INTO PARTS
$parts = parse_url($url);
if (isset($parts["path"])) {
  $parts["path"] = trim($parts["path"], "/");
  $parts["path"] = explode("/", $parts["path"]);
}
 
// (C) COMBINE & ENCODE PATH/FILE ONLY
$encoded = $parts["scheme"] . "://" . $parts["host"];
if (isset($parts["path"])) { foreach ($parts["path"] as $part) {
  $encoded .= "/";
  $encoded .= rawurlencode($part);
}}
 
// (D) THE RESULT
// http://site.com/my%20path/my%40%2B%25pic.jpg
print_r($parts);
echo $encoded;

Why do we have to go through so much trouble to encode the URL? The answer is pretty obvious – If we do rawurlencode("http://site.com/my path/my@+%pic.jpg"), that ends with a funky http%3A%2F%2Fsite.com%2Fmy%20path%2Fmy%40%2B%25pic.jpg. For you guys who want more specifics on what rawurlencode() does:

  • All non-alphanumeric characters will be replaced with a percent sign (%), followed by the corresponding hex code.
  • Dash, underscore, period, and tilde (-_.~) are exceptions (will not be replaced).
  • White spaces will be replaced with %20.

 

 

URL ENCODE EXAMPLE

2-url-encode.php
<?php
// (A) SOME DUMMY DATA TO APPEND
$data = [
  "name" => "John Doe",
  "email" => "john@doe.com"
];
 
// (B) CREATE QUERY STRING USING URLENCODE
$query = "";
foreach ($data as $key=>$value) {
  $query .= urlencode($key) . "=" . urlencode($value) . "&";
}
$query = substr($query, 0, -1); // strip the last &
 
// (C) FULL URL WITH QUERY STRING
// http://site.com/api?name=John+Doe&email=john%40doe.com
$url = "http://site.com/api?$query";
echo $url;

That’s right, notice how we don’t just “encode the entire string”? For you guys who want more specifics on urlencode():

  • All non-alphanumeric characters will be replaced with a percent sign (%), followed by the corresponding hex code.
  • Dash, underscore, period (-_.) are exceptions and left as they are.
  • White spaces will be replaced with the plus (+) sign.

 

 

IS IT REALLY NECESSARY?

Thankfully, most modern browsers are smart enough to do automatic URL encoding. For example, we forgot to encode the file name in <a href="my pic.jpg"/>. Most modern browsers will automatically turn that into my%20pic.jpg.

However, URL encoding is still necessary if you are working with server-to-server (CURL, fetch, sockets, etc…) calls. It’s painful, but that’s just how the Internet works.

 

USEFUL BITS & LINKS

That’s all for this guide, and here is a small section on some extras that may be useful to you.

 

QUICK SUMMARY – RAWURLENCODE VS URLENCODE

rawurlencode urlencode
Standard RFC 3986 RFC 1866
Used For Encoding URL path and file name. Encoding query string.
Character Replace Non-alphanumeric characters are replaced with a percent sign and corresponding hex code. Non-alphanumeric characters are replaced with a percent sign and corresponding hex code.
Exceptions Dash, underscore, period, and tilde are not replaced. Dash, underscore, period are not replaced.
White Spaces White spaces are replaced with %20. White spaces are replaced with a plus sign.

 

LINKS & REFERENCES

 

 

INFOGRAPHIC CHEAT SHEET

PHP rawurlencode vs urlencode (Click To Enlarge)

 

THE END

Thank you for reading, and we have come to the end of this guide. I hope that it has helped you to better understand, and if you want to share anything with this guide, please feel free to comment below. Good luck and happy coding!

Leave a Comment

Your email address will not be published. Required fields are marked *