Category Theory at Chartboost
If you have been following our blogs, you might know that the Adserver team at Chartboost favors statically typed functional programming to build and maintain the microservices that make up our adserving platform. Our ad serving platform is primarily built using microservices architecture and uses several microservices to serve a single ad request. Our ad servers get tens of thousands of requests per second and have a max latency bound of around 250ms, with individual microservice latency requirements as low as 30ms. In this post, I wanted to discuss some of the main engineering challenges for our adserver team, and why we looked into category theory, and what we learned from it and how we currently use it.
Engineering challenges at the Adserver team
- Performance
- Structuring and writing maintainable programs that meet the dynamic needs of the business with high confidence.
Adserver engineering codebase has evolved over the last 8 years. It started as a PHP codebase and then moved onto to a primarily scala codebase to meet the engineering challenges. One of the primary reasons for this move was to handle the performance requirements. We consider performance as one of the primary requirements in any microservice that we build and make as part of our adserver platform. Over the course of the last few years, as we better understand the dynamic nature of the business that we are in, we have realized that the ability of our engineering codebase to adapt to the dynamic nature of the business is a key success metric for the adserver engineering team. We employ the sophisticated type system of scala to write type-safe code, and use compile time checks as much as we can, to capture and eliminate a whole class of bugs and help us refactor our code with confidence. Structuring and writing maintainable code is an invaluable asset in every programmer’s toolkit. In the adserver team, we treat this as the most important trait of any code that we write and release.
Where does Category theory come into play?
Our interest in Category theory was to primarily evaluate it as another tool to introduce abstractions, and reason about our programs, enabling us to discover patterns and introduce changes at a higher level. While we had always used monads and functors, either intentionally or unintentionally, in the past, in the last several months we started using mtl style typeclasses in one of our microservices to encode side effects using type classes and favor tagless-final style of programming. Adserver team’s scala codebase has been using thecats
library to leverage functional programming, for a long time. We tend to favor the typelevel ecosystem, when it makes sense. So, we went with cats-effect
and cats-mtl
to introduce algebraic effects . After running this microservice in production for almost a year, we are super happy with both the performance and how they have allowed us to structure our programs. So, we are happy to add them to our internal microservices “essentials” list.
sealed abstract class ResponsePostProcessorAlg[E <: Endpoint, F[_]: Sync] { def runPostProcessors(snapshot: Snapshot,
qmd: RequestMetaData,
rmd: ResponseMetaData,
r: BidResponse)(implicit F: FunctorTell[F, Chain[LogMessage]]
): F[BidResponse]}
Here’s a typical encoding of one of the tasks in our microservices. The aim of sharing this snippet is to give readers a glimpse of how our newer microservices are shaping up and not teach how to write programs in this style, as that would take a longer and a separate post. Now let’s look at the libraries in play here. The abstract class is parameterized with a phantom type E
(to represent an endpoint that received the request) and an effect type F
which is constrained with a cats-effect
type class Sync
. The method runPostProcessors
allows the logging side effect via the type class FunctorTell
, which introduces the Writer
monad without having to maintain monad transformer stacks(using cats-mtl
). Log messages are aggregated using Chain
datatype and gets published to the logs when the effect F[A]
gets run to produce the result A
. This style of programming is an attempt to follow the tagless-final pattern in which we build algebras that represent the actions that are to be performed and an interpreter that executes the algebra to produce the results. In this snippet, ResponsePostProcessorAlg
is the algebra and it can be built with an effect type that meets the type class constraints Sync
and FunctorTell
to produce the result F[BidResponse]
which then gets executed or interpreted to produce the final result BidResponse
. Some of these might be unfamiliar to a new scala programmer. Our new engineers have found the official docs of cats
, cats-effect
, cats-mtl docs useful. And, the most useful resource for learning category theory, as we have found, is category-theory-for-programmers book by Bartosz Milewski.
With this style of encoding our server tasks, we had much more control over the effects we wanted over a large portion of our codebase, and better yet, understand and reason about the effects they were already using. Just by reading the signature of these algebras
, our engineers were able to reason about the effects a large portion of our codebase allowed and introduce/remove effects as necessary. This was such a nice thing to have as we often add/remove effects to experiment with performance. For example, if we needed to introduce an error effect? Just change the type signature to include a type class that allows it - ApplicativeError
or MonadError
. Now, you can use the error effect in places we wanted to, in an abstract way, without sticking to a concrete Option
or Either
type, and having type signatures like Future[Either[ApplicationError, BidResponse]]
. Since we never program to a concrete type but to an interface, we can always swap out the effect type with another one as long as it meets the type class constraints. For example, you have the option to swap cats-effect
IO effect type with Monix
Task type, if it meets your requirements.
//Before
sealed abstract class RevenueProcessor[E <: Endpoint: EndpointSettings, F[_]: Applicative[?[_]]]//After
sealed abstract class RevenueProcessor[E <: Endpoint: EndpointSettings, F[_]: ApplicativeError[?[_], Throwable]]
Is it all good stories?
As with the introduction of most new technologies, we increased our initial turnaround time for tickets, and the codebase was starting to look advanced, and almost completely unfamiliar to beginners. While we think that introducing abstractions did benefit us, it came to us with the cost of being hard to understand. So, our approach to introducing category theory or any advanced programming language feature is
- to evaluate its merits on a smaller scale
- introduce it to wider audience
- create learning materials
- move on to wider adoption within our codebase.
Adserver hackdays and Making Category Theory more accessible
Adserver hack days are days where we work on any project that we like. We mentioned one of our hack day contribution in our earlier blog posts. We decided to use some of these hack days to learn and introduce category theory for some of our experimental projects. Category Theory For Programmers is a widely known book in the programming community with snippets available in Haskell and Scala. As I was learning Category theory for work and personal interests, I decided to add the snippets in OCaml, and that effort is now complete. The pre-release version can be downloaded now. This took more than a year to finish, and it started because Marcello Seri offered to do the code review. Special thanks to Anton Kochkov for popularizing the effort on various social media platforms and garnering interest which gave the extra push to finish the work.
Do we use OCaml?
Not yet, but it’s of interest to us as it is in the realm of another statically typed functional programming language with solid academic theory and demonstrated industrial use.