intro.nim

Introduction to Binarylang

So one day, I'm working on a project and I realize that I need to do some socket level HTTP request sending and receiving. Since this is at the socket level, I can't really use the code that is a part of the standard library as easily, since parsing HTTP stuff is baked into other functions.

So, I'm given a response that looks a little something like this:

echo msg

HTTP/1.1 404 Not Found
Date: Sun, 18 Oct 2012 10:36:20 GMT
Server: Apache/2.2.14 (Win32)
Content-Length: 10
Connection: Closed
Content-Type: text/html; charset=iso-8859-1

here is me

Now there are quite a few ways to parse this. Regex is probably a pretty good cross language way of handling it, although then you'll have two problems instead of one. Nim's scanf module would also probably handle this pretty well, as would some sort of generic parser using parseutils or npeg.

However, none of these tools really address an important issue: how do I serialize the object?

If I want to be able to both parse the HTTP header from a server, and create one as a client, shouldn't that code be pretty much identical? Nim has strong meta-programming capabilities, why can't I just define what my format looks like and have a parser/serializer generated from that?

Enter binarylang.

Now, I can declare what my type looks like, and let it generate everything else for me. Let's try parsing just the first line in that HTTP header, shall we?

struct(http):
  s: _ = "HTTP/"
  s: version
  s: _ = " "
  s: code
  s: _ = " "
  s: msg
  s: _ = "\n"
print toHTTP(msg)

toHttp(msg)=Http(version:"1.1", code:"404", msg:"Not Found")

So what's going on here? First, we tell binarylang to go ahead and create a type called http, for parsing. Next, we use it to define the format of a header. A header has a string, that starts with HTTP/, and then a version. The version is also a string, so we prefix it with a s. Then, we look for a space to separate the two, and continue on in a similar fashion. Here, everything we are parsing happens to behave like a string, so we can have all the types be string. An underscore (_) simply signifies that we don't care enough about that value to name it. It is good for what are called "magic" values, or to skip to the field you actually care about.

Since binarylang operates on bitstreams, we turn the string into one, and then tell it to parse into an object.

So, we have a functioning parser for the example header. What do we do if we wanted to generate a header though?

var httpHeader = HTTP(version: "1.1", code: "200", msg: "OK")
echo httpHeader.fromHTTP

HTTP/1.1 200 OK

Wow. It pretty much just works.

Next are the headers. While we could proceed as we did earlier, for each of the headers, it won't quite work. HTTP headers can be in a different order, and more importantly, they can be anything. So, we need some way to parse a sequence of headers. First, let's define a type for the header itself.

struct(header):
  s: name
  s: _ = ": "
  s: value
  s: _ = "\n"
print "Server: Apache/2.2.14 (Win32)\n".toHeader

toHeader("Server: Apache/2.2.14 (Win32)\n")=Header(name:"Server", value:"Apache/2.2.14 (Win32)")

Fantastic! We now have a way to parse a single header line. Of course, we need to handle a list of these somehow. Thankfully, binarylang has us covered.

struct(http2):
  s: _ = "HTTP/"
  s: version
  s: _ = " "
  s: code
  s: _ = " "
  s: msg
  s: _ = "\n"
  *header: {headers}
  s: _ = "\n"
print msg.toHTTP2

toHttp2(msg)=Http2(
  version:"1.1",
  code:"404",
  msg:"Not Found",
  headers:@[
    Header(name:"Date", value:"Sun, 18 Oct 2012 10:36:20 GMT"),
    Header(name:"Server", value:"Apache/2.2.14 (Win32)"),
    Header(name:"Content-Length", value:"10"),
    Header(name:"Connection", value:"Closed"),
    Header(name:"Content-Type", value:"text/html; charset=iso-8859-1")
  ]
)

Hold on, what's going on here? What's up with all of the weird * and {}? Doesn't * mean a public property in Nim?

The * can be used for two different things in binarylang. It can be used to either make a field public, or to refer to an existing parser being used as a type. In this case, we can use it to refer to the header type that we defined earlier. As for the {headers}, the curly braces denote "read into a seq until the next value can be parsed". So, what happens is that we try to parse each header, and after parsing each one we see if the next thing on the stream is a newline. If it is, we stop parsing headers and finish, otherwise we keep adding on to that seq. Since HTTP headers use newlines to delimit the different sections, this works out fine.