Network security

HTTP

Posted on September 1, 2018. Filed under: Network security, SANS Dev 541 |

HTTP protocol is everywhere — fishes swim inside the water, do they really know what the water is?

Seriously, what is HTTP?

First of all,

What is http client and http server?

HTTP stands Hypertext Transfer Protocol, it is based on client-server architecture. The client generates an HTTP request, and the server responds with an HTTP response for the request.

HTTP client/server pair

The client usually is a web browser program such as firefox, IE, chrome, Safari, opera running on your PC, iPad, android, digital watch. The client essentially is a program that can generate an HTTP request. A desktop program can have a java program written with apache HttpClient, which act as a http client. A simple http client could be as simple as a shell command like curl, wget. An Ajax web page can also embed a http client, for example, the following javascript (jquery) code is a http client code issuing GET.

$(“button”).click(function(){

$.get(“demo_test.asp”, function(data, status){

alert(“Data: “ + data + “\nStatus: “ + status);

});

The server usually is a web server program running on an OS. The OS and underlying hardware could be anything, ranging from Linux, mac, windows, uCLinux etc. running on a Server, PC, android phone, router, switch, smart thermometer, virtual machine, docker container etc. We just put them into a blackbox labelled “infrastructure” and only focus on the server software. Some example server software suites are: apache web server application written in C/C++, a node.js web server program, a tomcat web container application with java servlet program, jetty web container application with jersey rest service java program, or as simple as a shell command.

Example 1: http client as curl and http server as nc command.

client commands

client>curl localhost:8080
Tue Jul 28 16:27:32 EDT 2020
client>curl -d “param1=value1&param2=value2” -X POST http://localhost:8080
Tue Jul 28 16:27:43 EDT 2020
client>curl -X PUT -d arg=val -d arg2=val2 localhost:8080
Tue Jul 28 16:27:55 EDT 2020
client>curl -X HEAD -I localhost:8080
HTTP/1.1 200 OK
Content-Type: text/html

server commands

server>while true; do echo -e “HTTP/1.1 200 OK\r\nContent-Type: text/html\r\n\r\n$(date)” | nc -l 8080; done
GET / HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.54.0
Accept: */*

POST / HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.54.0
Accept: */*
Content-Length: 27
Content-Type: application/x-www-form-urlencoded

param1=value1&param2=value2PUT / HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.54.0
Accept: */*
Content-Length: 17
Content-Type: application/x-www-form-urlencoded

arg=val&arg2=val2HEAD / HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.54.0
Accept: */*

Example 2: http client as chrome and http server as server application running on https://www.w3schools.com/jquery/jquery_ajax_get_post.asp.

What’s invisible at the software realm is the following network activities: under the http protocol implementation, a pair of sockets are the real hero. The http response (a string written as http response format: HTTP/1.1 x00 xx\r\n\r\n xxx) is sinked into a server socket as binary at server side, then get out from a client socket as binary at client side, then translate back into the response string — a process software engineers usually roughly think of as serialization and deserialization. The http response string is then read in by the client program (curl). The program understand http protocol, it knows the part “HTTP/1.1 200 OK\r\nContent-Type: text/html” is the header, the “\r\n\r\n” is the delimiter between header and content, and “Mon Feb 26 15:23:43 EST 2018” is the content. So the client choose to display/hide the header and content differently.

Now, we have seen that a minimal http server can be a shell command that is able to create a server socket, it don’t have to understand http protocol at all — it is listening on socket specified by the host port, upon receiving a batch of data (client request) from the server socket, it is able to print a header string “HTTP/1.1 200 OK\r\nContent-Type: text/html” followed by the content string, then sink the response down to a server socket. A well behaved http server, on the other hand, should generate the correct header strings ALL THE TIME according to HTTP guideline detailed in an long and boring article called RFC 7230-7235.

Curl command is a simple http client, a minimal http client is a program that understand a little bit of http protocol — it is capable of generating a request header string “GET / HTTP/1.1\r\nHost: localhost:8080\r\nUser-Agent: curl/7.54.0\r\nAccept: */*\r\n\r\n”, then sink it down to a client socket then wait for the server response string. Instead of using curl, we can have echo the request header string, then have netcat send it to localhost 8080 directly.

client>printf “GET / HTTP/1.1\r\nHost: localhost:8080\r\nUser-Agent: curl/7.54.0\r\nAccept: */*\r\n\r\n” | nc localhost 8080
HTTP/1.1 200 OK
Content-Type: text/html

Tue Jul 28 16:40:17 EDT 2020
client>

The same strategy can be used to query other web service as well, it is nothing to do with our one line http server. To prove this point, just use curl to query the springboot web’s root resource, then use the nc to do the same:

client>curl -v localhost:8080
* Rebuilt URL to: localhost:8080/
* Trying ::1…
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200
< Content-Type: text/plain;charset=UTF-8
< Content-Length: 37
< Date: Wed, 29 Jul 2020 03:34:30 GMT
<
* Connection #0 to host localhost left intact
hello spring web from application.ymlclient>
client>
client>printf “GET / HTTP/1.1\r\nHost: localhost:8080\r\nUser-Agent: curl/7.54.0\r\nAccept: */*\r\n\r\n” | nc localhost 8080
HTTP/1.1 200
Content-Type: text/plain;charset=UTF-8
Content-Length: 37
Date: Wed, 29 Jul 2020 03:34:37 GMT

hello spring web from application.ymlclient>

A well behaved http client, need to understand the guideline detailed in RFC 7230-7235, and always use the correct strings as the request content.

HTTP client/server are two socket nodes happen to use HTTP as protocol

socket communication on TCP/IP network

We have noticed, the server side can reply anything if the server choose to — for example, no matter what kind of request the client side sent, server always reply the same string “screw you!”. Of course, if the server don’t strictly follow the HTTP protocol, the client side may fail to parse the response or mis-understand the response and the communication messed up there. For example, a decent web browser can refuse to render “screw you” to the end user if the HTTP header is missing in the response. Browser can handle abnormal content other way, though, curl for example, render the weird response string any way and be honest with you about the HTTP non-conformity if you asks.

Client>curl localhost:8080

screw you

Client>curl -X HEAD -I localhost:8080

curl: (8) Weird server reply

So HTTP is a protocol — some kind of guidelines for effective communication. WWW, as huge as it is, has no central gorvenment. It is a collection of client/servers agree to communicate according to HTTP protocol. If a client or server don’t follow the protocol, there is no HTTP police or HTTP court (maybe search engines will give the server lower rate, or rouge host database will blacklist this server), but HTTP is a protocol, some kind of guideline, it is not a rule or law the client/server pair has to obey. A rouge server don’t follow HTTP protocol is like a pirate, it can choose to follow some of the protocol or not at all, it is the server implementations’ choice.

ghost story of HTTP

This let me think of the scene in Pirate of Caribbean.

Barbossa: First, your return to shore was not part of our negotiations nor our agreement so I must do nothing. And secondly, you must be a pirate for the pirate’s code to apply and you’re not. And thirdly, the code is more what you’d call “guidelines” than actual rules. Welcome aboard the Black Pearl, Miss Turner .

So the next question is:

What is HTTP as a communication protocol or “guidelines”?

HTML dress up

HTTP is not HTML

HTTP stands for Hypertext Transfer Protocol, a network communication protocol.
HTML stands for Hypertext Markup Language, a language for tagging text files to achieve font, color, graphic, and hyperlink effects on WWW pages.

A HTTP response can choose to use some HTML code in the response string. A HTTP response string without any HTML code don’t make it less HTTP — a web service can use json or xml as their response content string. In our simple http server example:

The response “HTTP/1.1 200 OK\r\nContent-Type: text/html\r\n\r\n$(date)” can be changed to have some HTML code though

while true; do echo -e “HTTP/1.1 200 OK\r\nContent-Type: text/html\r\n\r\n<html style=\”color:green;\”>$(date)</html>” | nc -l 8080; done

So those boring text “Mon Feb 26 15:23:43 EST 2018”

will be displayed pretty “Mon Feb 26 15:23:43 EST 2018” by browsers capable of understanding HTML.

HTTP protocol includes RFC 7230-7235

RFC 7230

Abstract

   The Hypertext Transfer Protocol (HTTP) is a stateless application-
   level protocol for distributed, collaborative, hypertext information
   systems.  This document provides an overview of HTTP architecture and
   its associated terminology, defines the "http" and "https" Uniform
   Resource Identifier (URI) schemes, defines the HTTP/1.1 message
   syntax and parsing requirements, and describes related security
   concerns for implementations.

RFC 7231

The Hypertext Transfer Protocol (HTTP) is a stateless \%application- level protocol for distributed, collaborative, hypertext information systems. This document defines the semantics of HTTP/1.1 messages, as expressed by request methods, request header fields, response status codes, and response header fields, along with the payload of messages (metadata and body content) and mechanisms for content negotiation.

RFC 7232

The Hypertext Transfer Protocol (HTTP) is a stateless application- level protocol for distributed, collaborative, hypertext information systems. This document defines HTTP/1.1 conditional requests, including metadata header fields for indicating state changes, request header fields for making preconditions on such state, and rules for constructing the responses to a conditional request when one or more preconditions evaluate to false.

RFC 7233

he Hypertext Transfer Protocol (HTTP) is a stateless application- level protocol for distributed, collaborative, hypertext information systems. This document defines range requests and the rules for constructing and combining responses to those requests.

RFC 7234

The Hypertext Transfer Protocol (HTTP) is a stateless \%application- level protocol for distributed, collaborative, hypertext information systems. This document defines HTTP caches and the associated header fields that control cache behavior or indicate cacheable response messages.

RFC 7235

The Hypertext Transfer Protocol (HTTP) is a stateless application- level protocol for distributed, collaborative, hypermedia information systems. This document defines the HTTP Authentication framework.

These RFCs includes many machines social protocol, the clients and hosts follow these guidelines can communicate effectively, because they know what to expect.

communication protocol

As a casual web surfer, we don’t remember HTTP protocols. We delegate the HTTP protocol handling to our browser program, then our brain only has to know we want to view that amazing picture then our finger clicks the link. (Yes, web browser automation tool selenium can do that too nowadays.) Server side program has a similar structure. The HTTP protocols are handled by web containers such as tomcat, jetty, server side programer just have to code what the http response content should be, then the web container translate the response string into correct HTTP header followed by content, according to HTTP protocol, and sink it into the server socket.

AJAX is running at Client side

Traditionally, client makes the HTTP request and the server receives this HTTP request and then process the request. After the server processes the request, it sends the full page back to the client in HTML format. Client then display the content in HTML format.

ajax model

AJAX shorthands for Asynchronous JavaScript and XML, it is now a group of technologies (HTML, Javascript, CSS) to make webpage more dynamic and interactive on the client side. AJAX is achieved by Ajax engine running in client browser. Browser sends a HTTP request to the server. The server receives this HTTP requested then process the request. After the server processes the request, it sends back a page including HTML, CSS and Javascript. These javascript code will be executed in the browser by the AJAX engine (Javascript interpreter).

At this point, the client side can be partially updated with AJAX. For example, when client submits a request such as looking up a dictionary record to the server, browser generates a DOM event and handle it to AJAX engine, which acts as a robot in the middle. AJAX engine using the javascript code received from previous response to decide what to do with the DOM event. The javascript code tells the AJAX engine to send an HTTP request to the sever using XMLHTTPRequest. AJAX engine performs the request. The sever returns data to the AJAX engine, the dictionary lookup result in xml or json format. The AJAX engine received this data, use the XMLHTTPRequest’s callback function to decide what to do with the response, for example, generate the HTML code for the DOM needing update. Ajax engine then passes an HTML page rendering request to the web browser (web-rendering engine) for display. Web-rendering engine then refresh part of the page instead of the whole page.

Ajax engine is a robot

The data transfer made by XMLHTTPRequest goes through HTTP and it can make any type of HTTP request such as GET, POST, and COPY. Because AJAX engine is a robot, the XMLHTTPRequest is handled in the background and the user might not notice the progress or existence of the data transfer.

AJAX engine Security constraints

AJAX engine is powerful, the browser vendors have to put security constraint into this robot to prevent it to do evil. One of the most important policy is called Same Origin Policy.

same origin policy

Same Origin Policy prevents scripts from one website getting or setting properties of a document loaded from a different site. For example, you are on site http://www.good.com, and a javascript is asking your AJAX engine to send HTTP request to load contents from http://www.evil.com. Because http://www.good.com and http://www.evil.com are different site, this is deemed to be a cross-domain request. Same Origin Policy burned in the AJAX engine disallows such HTTP request. In this case, browser vendors such as IE/microsoft, safari/Mac, chrome/google are protecting you.

The softwares with network capability don’t necessarily honor the Same Origin Policy. You can browse websites with any software, it doesn’t mean you should browser websites with any software on the market.

Relaxation of Same Origin policy for trusted sites

Web 2.0 trending is to mashup contents from multiple sites, there are tons of web services from amazon, google, microsoft etc. to provide information. A browser strictly follows the Same Origin Policy will be useless or quickly obsolete, in another words, mashup from multiple sites will soon be unavoidable.

To perform legitimate cross-domain AJAX, original XMLHTTPRequest are expanded to XMLHTTPRequest level2. All browsers except MS IE now support XMLHTTPRequest (XHR) level 2.

On the browser side, the policy now categorize HTTP into simple and preflighted Request

if the XMLHTTPRequest equest is GET or POST and no custom HTTP headers are added, then the request is safe simple request and browser directly sent to the server. The server responds the request by sending HTTP response. In the HTTP response, an extra header “Access-Control-Allow-Orign” is present, and the value is the web server’s domain name. This header means server is explicitly acknowledge that cross-domain request can be safely handled there. The browser then check the response. If “Access-Control-Allow-Origin” header matches the previous cross domain request’s destination, the response content is returned to XMLHTTPRequest call, otherwise throws security exception.
for request other than GET and POST, or a custom entry is present in the HTTP header, XHR level2 will first send an OPTIONS to server asking for access control policy at server side. The OPTIONS request’s HTTP header will have “Access-Control-Request-Method” and “Access-Control-Request-Headers” to communicate the intended request detail to the server. If the server allows this request, it will send an response with HTTP header containing “Access-Control-Allow-Origin” indicating the allowed javascript source, “Access-Control-Allow-Method” and “Access-Control-Allow-Header” indicating the allowed request and customer headers in the request. Once browser received the OPTIONS’s request, it check if the server expressed its permission to send the actual request, then send or drop the request.

pre-flight request

As we see, the cross-domain request policy is controlled at server side (original server and cross domain request destination server). The policy is returned from the destination server instead of the original server. The purpose is to prevent your current webpage to attack other domains from the origin server. It is not to protect the current webpage from bringing in bad content from third party.

What prevent the current webpage to bring in bad content is a set of proactive defense and operation security maintained at server side. For example, input/output verification, anti-CSRF, honeypot, anti-automation, two-factor authentication etc. which server side such as google, facebook, myspace etc. can provide. wordpress.com for example, don’t allow random javascript to be ran by the bloggers.

convince the doggy with a bone

The sad fact is, webpages in general, for example, blogger.com post, can have bad content in it, the server side though applied many security measures such as escape special characters in html code, input/output verification, could still not apply enough measure to stop bad content to slip through. A normal post for example, can potentially convince your ajax robot to run javascript to download things into your computer just by you browsing it.

$(document).ready(function() {
   // put Ajax here.
 });

We can not expect every individual we came across to be nice (most of them are, few could be thief stealing without your notice), for the same reason, we can not expect every website we came across to be nice.

It is unfortunately the bloggers’ conscious that eventually prevent bad content to be put into the webpages.

Fortunately, wordpress.com posts can not harm your computer by filtering javascript (at original article written time). Google’s blogger allows random javascript, but the search engine won’t give the blog any traffic if the AI smells anything bad.Merely mentioning anything related to network security is enough to trigger throttling of the traffic unless the AI knows the author personally.

Read Full Post | Make a Comment ( None so far )