Reading Files in Unison
How do you do I/O in a pure functional language using Algebraic Effects?
I have covered how to deal with side effects in Unison previously, but I could not make space for discussing I/O in earlier articles. It is time to remedy that omission.
Reading and writing to files, sockets, processes, or console requires you to tag your functions with the IO
ability. As discussed in my last Unison article, these abilities bubble up the call stack until you decide to handle them. It is very similar to how methods in Java need to be declared as throwing exceptions if they call one or more other methods throwing exceptions. Asynchronous programming in many languages replicate much of the same behavior.
One of the peculiarities of this behavior makes me think of a rather famous blog entry by Bob Nystrom called What Color is Your Functionfrom back in 2015. Ted Kaminski followed up with his ownreflections on function coloring in relation to learning and using Haskell:
But adding a print makes that function do I/O. So now that function’s type changes because it needs to return
IO a
instead ofa
. And all functions that call it need to be changed because those are now doing I/O, too. And all functions that call those functions, and so on.It’s infectious. — Ted Kaminski
This feeling is very relatable when dealing with abilities in Unison. However, I will argue that in practice it both looks and feels a lot easier than I can remember from laying with Haskell as well as many other languages such as Swift, Kotlin, and Zig which have also embraced tagged unions for optional types and errors.
Enough musings, let's start with a concrete example.
Reading a List of Lines
In Unison, you can open files with the FilePath.open
function. It will return a Handle
object which you can use to do things such as reading individual characters with Handle.getChar
, check if you have reached the end of the file with Handle.isEOF
or read a whole line with the Handle.getLine
function. When you are done with the file, you have to remember to call Handle.close
on the handle.
As discussed in a previous Unison article, you can typically skip namespaces like Handle
and just write open
, close
and getLine
because Unison will be able to figure out which function you want based on the types of the arguments.
You can see in the following code example that I am returning a handle named file
by simply calling open
. Unison figures out that open
corresponds to the base.IO.FilePath.open
function because it sees that the first argument is of type FilePath
and the second argument is of type FileMode
.
file = open (FilePath filename) Read
Of course FilePath
and Read
are not fully qualified either here, but since there are no other FilePath
or Read
identifiers in any other namespace, the Unison compiler can figure out which function and type we had in mind.
The file handle can be passed to other functions. I want to demonstrate the bubble up effect, or coloring of functions as some may call it, by defining a function getLines
which calls Handle.getLine
multiple times to build up a list of Text
values.
getLines : Handle ->{IO, Exception} [Text]
getLines file =
if isEOF file then
[]
else
line = Handle.getLine file
line +: (getLines file)
Because getLine
has the IO
and Exception
ability, we have to tag our getLines
function with those abilities as well. We could have avoided it by handling the abilities (remember handle-with statements). But that is fine because it keeps our getLines
function simple and clean.
Despite the fact that exceptions may occur in this file, and we are doing I/O the code looks all normal. That is the benefit of abilities. You can just ignore all the complex side effects which may occur as you write your code.
To be able to run the getLines
function, we need to define a new function which takes no arguments and returns no values, but which use the IO
and Exception
abilities. Such a function can be invoked as the main function of a program from the Unison Code Manager (UCM).
readLinesProgram
is such a function which can be executed by writing run readLinesProgram
in the UCM. You will notice that we use open
to get a handle to a file name "hello.txt"
and then we use our getLines
function to read all the lines in the file before closing the file.
readLinesProgram : () -> {IO, Exception} ()
readLinesProgram _ =
file = open (FilePath "hello.txt") Read
lines = getLines file
close file
foreach lines (line ->
printLine line
)
To be able to see what has been read, we use the printLine
function, which writes text objects to the console. It gets called repeatedly by the foreach
function. foreach
differs from map
in that it doesn't return any collection. Thus, foreach
is meant to repeatedly call functions which don't return values. Typically, that will be functions with side effects such as printLine
.
Parsing a CSV File
Let us download some data to have something to experiment with. In my article about Unix commands, I process a file with pizza sales data using Unix commands such as tail
, cut
and tr
. Let us download the same data and attempt a similar style of text processing using Unison. The following Unix commands will download a CSV file and store it in the pizzaplace.csv
file.
❯ RDATASETS=https://vincentarelbundock.github.io/Rdatasets/csv
❯ curl -s $RDATASETS/gt/pizzaplace.csv > pizzaplace.csv
If you examine the file, you will see that it is not formatted very nicely. Instead, we want to extract some subset of data and pretty print it as shown below:
classic M 13.25
classic M 16.0
veggie M 16.0
chicken L 20.75
veggie L 18.5
supreme L 20.75
supreme L 20.75
supreme M 16.5
supreme M 16.5
supreme M 16.5
First Attempt at Reading CSV File
We define a function readCSVFile
which takes a filename as argument and reads that file, processing each line to get a prettier output. Since the file is large, we specify how many lines we would like to read.
Since Unison is a pure functional language without for-loops, we need to fake one by defining a local loop
function which keeps track of the number of iterations, much like a for-loop. Inside, we check if we have done iterating and if we have reached the end of the file by calling Handle.isEOF
.
Otherwise, we call getLine
to get a single line, which we process with a series of function calls chained together with the pipe |>
operator.
readCSVFile : Text -> Nat ->{IO, Exception} ()
readCSVFile filename nlines =
file = open (FilePath filename) Read
loop n =
if n <= 0 || Handle.isEOF file then
()
else
getLine file
|> toCharList
|> filter (not << (==) ?")
|> fromCharList
|> Text.split ?,
|> takeRight 3
|> join "\t"
|> printLine
loop (n - 1)
loop nlines
close file
readCSVFileProgram : () -> {IO, Exception} ()
readCSVFileProgram _ =
readCSVFile "pizzaplace.csv" 5
Allow me to explain what all the chained functions do: First, we get a line from the file with the getLine file
call. This line is turned into a list of characters. The purpose is to make it easier to filter out the "
characters. When done filtering, we turn the characters back into a single Text
object with the fromCharList
function. This Text
object is further split into a list of multiple Text
object for each occurrence of comma. That is what the Text.split ?,
function call does for us. We don't want all the columns of data, so we use takeRight 3
to get the 3 columns from the right containing size, type, and price of the pizza.
To make these columns presentable, we join them together again with a tab character in between using the join "\t"
function call. This results in a single Text
object which is fed to the printLine
function.
We can run the readCSVFileProgram
function from the Unison Code Manager. First, I like to make a namespace for the functions we are creating here and adding dependencies to types and functions in the base
library. Then I use add
to add the functions defined in my source code files to the pizza
namespace.
.> cd pizza
.> fork .base lib.base
.pizza> add
When we run the pizza program, you can see that we get the columns in the wrong order.
.pizza> run readCSVFileProgram
size type price
M classic 13.25
M classic 16
M veggie 16
L chicken 20.75
L veggie 18.5
L supreme 20.75
L supreme 20.75
M supreme 16.5
M supreme 16.5
M supreme 16.5
S chicken 12.75
S classic 12
S supreme 12.5
S supreme 12.5
When reading data from a file, we often want to create typed objects which we can more easily pass around the system and manipulate. For instance, that will make it easier to change the order of the columns. That is what we will do next to improve the output.
Defining Pizza Objects
If we store pizza information in special objects when we can also define how those objects get textually represented. We define an enum Size
to hold the size of the pizza sold.
Keep reading with a 7-day free trial
Subscribe to Erik Explores to keep reading this post and get 7 days of free access to the full post archives.