Superpowered keyword args in Haskell

Overview

As programs get larger and more complex, the components therein also grow. This happens on several levels; we often have to deal with growing applications, packages, modules, and test suites. This blog post is about how to deal with the growth of the humble function.

If you’ve ever worked on a production codebase, you know that one function that lives in a module that several others depend on. It sits at the very core of your application, growing in complexity to support the ever-expanding needs of the business. It is the entrypoint into a sea of complexity…and it has several positional arguments that mean nothing without documentation and other context. It is a powerful and useful beast, but it must be tamed.

Domain

Maybe you work on an information system for posts of some kind where a common operation is fetching a set of posts. Various filters/sorts need to be applied conditionally based on whatever the caller needs:

Filter - Posted by a certain person
Filter - Posted before or after a certain date
Filter - Liked by a particular other user
Filter - Commented on by a particular other user
Filter - Accessible to a particular user
Filter - Public only
Sort - Popular posts (top n posts ordered by likes)
Sort - Most controversial posts (top n posts ordered by number of comments)
Sort - Most linked posts (linked to by other posts)

A function

The type signature for the function that does this type of fetch could look something like this:

newtype UserId = UserId Integer
data TimeSpan = TimeSpan { start :: UTCTime, end :: UTCTime }
data Sort = Popular | Controversial | MostLinked

fetchPosts
  :: MonadFetch m
  => Maybe UserId
  -> Maybe TimeSpan
  -> Maybe UserId
  -> Maybe UserId
  -> Maybe UserId
  -> Bool
  -> Sort
  -> m [Post]
fetchPosts userId timeSpan likedBy commentedOnBy accessibleBy publicOnly sort
  = -- omitted

This works, but is very error prone. Even though we’re making use of newtype wrappers, the positional arguments could easily get mixed up, causing, for instance, a fetch for posts likedBy a user instead of the expected posts commentedOnBy by a user. Every time a call to this function is made, you have to be careful to provide the correct arguments in the correct order.

A safer approach

One way to improve this situation is to make extensive use of newtype wrappers and avoid Boolean Blindness by utilizing more specific Sum Types. This is a pretty common idiom, and it helps a bit:

data VisibilityFilter = PublicOnly | PublicOrPrivate
newtype AuthorId = AuthorId UserId -- and a bunch more, you get the picture

fetchPosts
  :: MonadFetch m
  => Maybe AuthorId
  -> Maybe TimeSpan
  -> Maybe LikedBy
  -> Maybe CommentedOnBy
  -> Maybe AccessibleBy
  -> VisibilityFilter
  -> Sort
  -> m [Post]
fetchPosts userId timeSpan likedBy commentedOnBy accessibleBy publicOnly sort
  = -- ...

This is safer, but arguably more annoying to work with due to all the manual packing of newtypes that needs to be done, and every call is pretty noisy. Consider what fetching all public posts for a given author would look like:

main = do
  posts <- runFetch $
    fetchPosts
      (Just (AuthorId (UserId 1)))
      Nothing
      Nothing
      Nothing
      Nothing
      PublicOnly
      SortPopular
  -- ... do something with posts

That is a lot of typing “I don’t care” for one function call, and, although we have moved them from runtime bugs to compile time errors, argument transposition is still annoying problem that we have to deal with somehow.

Additionally, every time we add a new filter to this function, we need to update all of the places it is called. This is a very general function that can be used in many different ways, so that could be a significant number of callsites depending on the sixe of the project.

The Keyword Args Pattern

The pattern I will describe here solves the argument transposition issue and reduces specifying “I don’t care” for very general function call. The trick, if you can call it that, is to stuff all of those parameters into a data type, then make that data type defaultable and allow overriding its fields to deal with special cases like filters and sorting. It reminds me of Keyword Args in languages like Python, so I’ll refer to this as the Keyword Args Pattern.

data FetchPostsArgs =
  { authorId :: Maybe UserId
  , timeSpan :: Maybe TimeSpan
  , likedBy :: Maybe UserId
  , commentedOnBy :: Maybe UserId
  , accessibleBy :: Maybe UserId
  , visibilityFilter :: VisibilityFilter
  , sort :: Sort
  }

defaultFetchPostsArgs :: FetchPostsArgs
defaultFetchPostsArgs = FetchPostsArgs
  { authorId = Nothing
  , timeSpan = Nothing
  , likedBy = Nothing
  , commentedOnBy = Nothing
  , accessibleBy = Nothing
  , visibilityFilter = Nothing
  , sort = SortPopular
  }

fetchPosts :: MonadFetch m => FetchPostsArgs -> m [Post]
fetchPosts FetchPostsArgs{..} = -- ...

now fetching all public posts for a given author looks like this:

main = do
  posts <- runFetch $
    fetchPosts $ defaultFetchPostsArgs { authorId = UserId 1 }
  -- ... do something with posts

not only is that easier to write, but it also lets us explicitly specify what we want to change about the general functionality (“fetch all posts”) without needing to explicitly specify what we don’t care about changing (all other filters and sort methods)

worth noting that RecordWildCards is utilized here to retain exactly the same argument names as before, so the function body would not need to change.

Adding functionality

suppose we now want to not only filter by a single user, but we also want to search for posts that contain certain tags. We could modify FetchPostsArgs to support this case:

data FetchPostsArgs =
  { authorId :: Maybe UserId
  , timeSpan :: Maybe TimeSpan
  , likedBy :: Maybe UserId
  , commentedOnBy :: Maybe UserId
  , accessibleBy :: Maybe UserId
  , visibilityFilter :: VisibilityFilter
  , sort :: Sort
  , tags :: [Tag]
  }

defaultFetchPostsArgs :: FetchPostsArgs
defaultFetchPostsArgs = FetchPostsArgs
  { authorId = Nothing
  , timeSpan = Nothing
  , likedBy = Nothing
  , commentedOnBy = Nothing
  , accessibleBy = Nothing
  , visibilityFilter = Nothing
  , sort = SortPopular
  , tags = []
  }

and update the body of fetchPosts as needed.

Utilizing Booleans and Smart Constructors

fetchPostArgs looks a lot like a Monoid, and we can certainly frame it that way, with a little bit of finagling:

data Visibility = Public | Private | Protected

data FetchPostsArgs =
  { authorId :: Last UserId
  , timeSpan :: Last TimeSpan
  , likedBy :: Last UserId
  , commentedOnBy :: Last UserId
  , accessibleBy :: Last UserId
  , forbiddenVisibilities :: [Visibility]
  , sorts :: [Sort]
  , tags :: [Tag]
  }

defaultFetchPostsArgs :: FetchPostsArgs
defaultFetchPostsArgs = FetchPostsArgs
  { authorId = Last Nothing
  , timeSpan = Last Nothing
  , likedBy = Last Nothing
  , commentedOnBy = Last Nothing
  , accessibleBy = Last Nothing
  , forbiddenVisibilities = []
  , sorts = []
  , tags = []
  }

instance Semigroup FetchPostsArgs where
  (<>) = -- ... pairwise <>
instance Monoid FetchPostsArgs where
  mempty = -- ... every field is `mempty`

with this formulation, we can build up Smart Constructors and compose them to create more specific queries. For example, suppose we want to find all public posts by User 2 tagged “Haskell” and “Code” posted less than 7 days ago, that I (User 1) liked. Some Smart Constructors can help here:

publicOnlyArgs :: FetchPostsArgs
publicOnlyArgs = mempty { forbiddenVisibilities = [Private, Protected] }

likedByArgs :: UserId -> FetchPostsArgs
likedByArgs userId = mempty { likedBy = Last (Just userId) }

authoredByArgs :: UserId -> FetchPostsArgs
authoredByArgs userid = mempty { authorId = Last (Just userId) }

taggedArgs :: Tag -> FetchPostsArgs
taggedArgs tag = mempty { tags = [tag] }

postedAfter :: UTCTime -> IO FetchPostsArgs
postedAfter time = do
  now <- getCurrentTime
  pure $ mempty { timeSpan = Last (TimeSpan time now) }

and our fetch looks like:

main = do
  sevenDaysAgo <- subtractDays 7 <$> getCurrentTime
  timeArgs <- postedAfter sevenDaysAgo

  posts <- runFetch $
    fetchPosts
      $  timeArgs
      <> publicOnlyArgs
      <> authoredByArgs (UserId 2)
      <> likedByArgs (UserId 1)
      <> taggedArgs (Tag "Haskell")
      <> taggedArgs (Tag "Code")

  -- ... do something with posts

Using Monoids gives us better composition of common operations in many cases. I might argue that taking this step is unnecessary for this particular example, but it is at least helpful for illustrative purposes and might come in handy depending on your use case.

A small note

Suppose we were writing a function fetchPostsForUser instead, which always required a UserId to fetch posts for. In this case, it does make sense to pull that individual argument outside of the Args data type and provide it as a required parameter, e.g.:

fetchPostsForUser :: MonadFetch m => UserId -> FetchPostsArgs -> m [Post]

The conceptual reason for this is that fetchPostsForUser has no default behavior without a UserId, and we need default behavior for the Monoid to remain sensible.

In Short

If you are dealing with functions that:

Have many positional arguments
Are called in a variety of ways to accomplish specific goals
Have a well-behaved “default” operation, where the arguments simply specify refinements on that default operation

then the Keyword Args pattern might work for you.

2020-09-22