Skip to content
/ http Public

A toy HTTP server used as a sandbox for learning c++11 features, kqueue & libuv non-blocking IO

Notifications You must be signed in to change notification settings

jmoyers/http

Repository files navigation

  • kqueue.c is a stupid small non-blocking http "server" to learn the api
  • tcp.cpp is the same thing, poking around in c++
  • parser.h is a toy http header parser
  • going to be pushing on to websockets

networking api notes

 Server transport state machine
 1. Acquire socket with socket(), set fd to nonblocking with fcntl()
 2. Bind socket to address/port with bind()
 3. Begin listening with listen()
 4. Subscribe to kqueue with kqueue(), EV_SET(), kevent()
 5. Loop for kqueue events with kevent()
  a. You can get an event with your socket set as ident, new client connected
  b. You can get read events which means you can read a client sent msg
  c. You can get an eof, which means the client has disconnected

 Client transport state machine
 1. Acquire socket with socket(), set fd to nonblocking with fcntl()
 2. Subscribe to kqueue for this socket with kqueue(), EV_SET(), kevent()
 3. Connect to server with socket, server addr with connect()
 4. Send a message to the server with write() and a buffer
 5. Loop for kqueue events with kevent()
  a. You can get read events for server returning traffic to you
  b. You can get an eof from the server, which means you close the socket

  In TCP, a socket represents a connections -- both send and receive.
  When you acquire a port to listen on, you get notified of new connections
  via kqueue with a new socket for you to accept, read from, and close as
  you will.

server
  purpose
    application layer
    listens on addr/ports
    track connected clients - RAII for sockets
    ? high level protocol events, e.g. get(path, req, res)
  subscribe - n listeners
    client connect
    client disconnect
    read
    eof

    ? http
      request
        client
        read
      response
        client
        send
        end
      verb(path, request, response)
  transport
    purpose
      abstract tcp/udp/etc, kqueue/epoll/completion
      acquire/dispose of sockets
      listen/send
      suitable for both a client or server

    socket (listening)
      socketstate
      descriptor

    bind 
    listen

    subscribe
      events - n listeners
        client connect
        client disconnect
        read

    clients
      address
      socket
        socketstate
        descriptor

    send - data, length, client
    close - client
    shutdown - client

    run
      kqueue loop
        client connect << notified a new client connecting
          call accept: get new client
            socket descriptor
            address
          notify: client connect - client
        read << notified of new data available on descriptor
          get socket from kqueue
          call recv
          event: read - client, data, length
        eof << notified of eof received on descriptor
          flags & EV_EOF
          currently we close()
          event: client disconnect - client


buffer
  purpose
    efficient
      hold buffer from network api with minimal copying
      reusable
      ? doesn't necessarily memset 0 (malloc not calloc)
    provide some bounds checking
    provide convenient interface for parser
    ? handles lists of buffers and indexes into them
  discuss
    std::string
      notes
        string vs wstring - http://goo.gl/hXOCVU
        ? string still not good at manipulating utf8?
        ? std::string can store unicode stuff, not manipulate (UTF8)
        ? std::wstring can manipulate (UCS-2)
      pros
        can handle multiple null terminators
        has bounds checking
        convenient for parser
        contiguous storage since C++0x (http://goo.gl/KJKQbB)
      cons
        .data() iand .c_str() are const
        consequently can't hand a buffer off to c api
        wide api surface for data that is not text
    std::vector<char|unsigned char|wchat_t|uint8_t>
      notes
        customize buffer class with template
          TCPTransport<std::vector<uint8_t>> - "concept" api
            .data() for pointer to 0
            subscript &data[0]
            size() for bounds checking
        ascii table - http://www.asciitable.com/
          0 to 127 non-extended
          128 - 255 extended
          ? non-extended fit in either char or unsigned char
        char
          char *test = "ol"
          c standard: char isn't guaranteed to be 8 bit
          c standard: char isn't guaranteed not to have padding bits
          in practice: signedness is always two's complement.
          in practice: an integer always uses all bits allocated.
          ? undefined behavior: if c is signed char[], c[0] = 0xF0
          ? due to signedness, not suitable for arbitrary bitwise operations
          unsigned vs signed as buffer - http://goo.gl/XTc00s
        unsigned char
          http://goo.gl/1Zwr7M
          no padding bits
          bitwise produce no undefined behavior, trap representations, overflow
          trap representation - http://goo.gl/eQpIBi
            bit value stored in var of type x where if used as value cause UB
          can alias any data type
        uint8_t
          guaranteed to be 8 bit
          works with shifts, unary, etc
          ? not guaranteed to exist on some embedded systems
          ? makes parsing weird, can't do buf[index] == 'a'
        wchar_t
          good for holding unicode
          wchar_t = 2 bytes on windows, 4 byte on linux
          wchar_t is not tied to unicode, but guarantees storage
        bool
          has a specialization for extremely compact storage
          http://www.cplusplus.com/reference/vector/vector-bool/
          not good for processing
          variable size bitset
      pros
        .reserve() and .resize() make it resizable
        bounds checking
        .data() provides a nice underlying block of contiguous memory
        can hand off &vec[0] or .data() to c functions like recv
      cons
        slightly less convenient for parser
        std::string(v.data(), v.size()) is a copy, no safe way without a copy
    custom buffer
      needs a pointer, a size, and an offset
      very simple
      could use realloc, memcpy directly
      could implement << and >> and other stream interfaces
      could implement .begin() and .end() for range for
      iterators?

client
  purpose
    be able to store list of connected clients
    holds address, socket, socketstate, which transport
    part of allowing server to be io api agnostic (kqueue, epoll, completion)

  socket
    socketstate
    desriptor
  address

  Does send/recv/transport even belong here?
  ? bound_tranport
  ? stream_in
  ? stream_out
  ? send
  ? recv

event - abstract kqueue events, api compatible with epoll
  subscriber
    std::function(client, flags)

performance
  tools
    gperftools (google) - http://goo.gl/GKT8D7
  slab allocators
    why?
      still reading
    "Memory Allocation for Long-Running..." - http://goo.gl/IRME5u
    jemalloc (canonware) - http://goo.gl/4wYkgS
    jemalloc (facebook) - http://goo.gl/DX79Yk
    jemalloc tech talk - http://goo.gl/2YLWTx, alt: http://goo.gl/fsdZ3n
    tcmalloc (google) - http://goo.gl/GKT8D7
    data locality - http://goo.gl/y6SsFi
    locality of reference - http://goo.gl/tyk6uV
    "Memory allocator microbenchmark results are notoriously difficult to extrapolate to real-world applications"

About

A toy HTTP server used as a sandbox for learning c++11 features, kqueue & libuv non-blocking IO

Topics

Resources

Stars

Watchers

Forks