Crypto Island


Crypto Taxes the Hard Way

Posted 2023-02-18
Updated 2023-02-25
Tags: hledger, haskell, nix, taxes, accounting, reproducibility, git

Disclaimer 1: nothing on this blog is advice about the substance of your taxes! I have no background in accounting and no idea whether this code will produce valid results. You need to verify everything yourself and then own your own mistakes or hire a tech-savvy CPA (or equivalent in your country) to go over it and fix any problems. That’s what I’ll be doing.

Disclaimer 2: this really is the hard way. If your taxes are relatively simple, consider trying one of the standard crypto tax subscription services or hiring a human instead. I would guess the break even point is probably around 5-10 different arcane data formats (including banks, exchanges, and wallets). If you’re dealing with more than that, it might be worth setting this up.

With that out of the way, I’ve recently started treating my taxes as a software/data science pipeline! It hasn’t been easy, but it has worked better so far than all the easier-sounding ways I’ve attempted to organize them before. Read on if you think you might be in the same boat…

Full-fledged hledger

Plain text accounting is an obvious win in general because you can version control it. But there are several good tools to choose from and many of their features overlap. I’ve gone with hledger (so far) mainly so I could follow this excellent “full-fledged hledger” tutorial. I like the principles in the README and the wiki. For me, this is the most important part:

It should be easy to work towards eventual consistency. Large and daunting tasks (like “I will process 10 years of paper mortgage statements” or “I want to import 5 years of paypal payments”) should not require one big push and perfect planning to finish them. Instead I should be able to do them bit by little bit, leaving things half-done, and picking them up later with little (mental) effort. Eventually my records would be perfect and consistent.

I might adopt the format advocated by the related hledger-flow project at some point too.

What to expect from these posts

Today is a “relatively quick start” guide based on 02-getting-data-in in the full-fledged tutorial with mock exchange data rather than a bank account. I suggest starting your own repo right now, working through this first post, working through the rest of that tutorial, and finally coming back here later for crypto-specific addons like:

You’ll probably invent some addons of your own too, and I’d love to hear about them!

Start your repo

Here is a tarball of today’s code to use as a template. I’ll assume you use git for simplicity, but nothing important relies on that. You can also read through it on GitHub.

Installing dependencies will probably take at least a couple minutes. I use Nix whenever I expect a project to involve more than one language. To try it that way run the Nix install script, then open a new terminal and start nix-shell inside the repo. Alternatively the full-fledged tutorial includes a Dockerfile with a pre-built image you can pull.

Look around

High level tour of the code

Your new repo should look roughly like this (omitting generated files):

├── shell.nix (or Dockerfile)
├── 2023.journal
├── all.journal
├── config.journal
├── import
│   └── mockex
│       ├── mockex.rules
│       ├── csv
│       │   └── trades-2023.csv
│       └── csv2journal
└── export
    └── export.hs

Manually written hledger journal files go at the top level, input data + code to parse it in import/, and the export script + everything it generates in export/. The main journal for each tax year depends on the previous year’s main + generated journal files. That makes it possible to automatically open and close the books.

Most of the interesting logic is in export/export.hs. It’s written with Shake, which is like Make for Haskell. It parses the journal files for include statements and uses those + some hard-coded rules to build a dependency DAG and update the relevant files in order when one of the inputs changes. is just a more obvious entry point.

Follow data from one CSV file

Here is an ASCII diagram of data flow through the “full-fledged” system. It’s confusing at first but worth learning because it was refined over ~10 years of successful journal maintenance. For today we’ll focus on only the parts needed to answer the question, “What happens when you add or edit a CSV file?”

import/mockex/csv/* (1)
      |  csv2journal (2)
import/mockex/journal/* (3)
|2023.journal| (4)
     | |
     | v
     | export/2023-* (5)
|all.journal| (6)
  1. The only data source so far is “MockEx”, a made up exchange. The trade data starts in import/mockex/csv/trades-2023.csv.
"Transaction ID","Time","Type","Asset","Amount","Fee","Price"
  1. csv2journal parses the csv according to import/mockex/mockex.rules, which is written in the hledger CSV rules format. You’ll need that because everyone seems to make up their own CSV conventions as they go.

  2. The generated import/mockex/journal/trades-2023.journal in hledger format goes here. You’ll mainly look at these per-import journal files to debug your CSV rules.

2023-02-01 (078blk23598s) MockEx buy
    assets:exchanges:mockex       BTC0.0100
    assets:exchanges:mockex          USD-35
    expenses:fees                 BTC0.0001

2023-05-01 (078werfgsdaf) MockEx buy
    assets:exchanges:mockex       BTC0.0100
    assets:exchanges:mockex          USD-45
    expenses:fees                 BTC0.0001
  1. You include the per-import journal file in 2023.journal by hand. That tells export/export.hs to generate it from the CSV and hledger to read it. You might also add some transactions here by hand. In this case there’s an opening balance.
;; Settings you want in all your journals
include ./config.journal

;; Opening balances
;; This only needs to be done once for the first year you track
;; After that there are auto-generated opening + closing transactions
2023/01/01 opening balances
  assets:exchanges:mockex    = 100.00 USD
  equity:opening balances

;; Add not-yet-generated files here to tell export.hs to generate them
;; from the corresponding CSV inputs
include ./import/mockex/journal/trades-2023.journal
  1. export/export.hs generates financial reports here along with 2023-all.journal. It looks trivial now (see next section) but once you have dozens of data sources it’s is very helpful to see them merged into one linear history.

  2. You include the journal for each year into all.journal by hand. It’s what you load to look at your finances interactively.

Try some reports

First, generate all the files:

[nix-shell]$ ./
# csv2journal (for ../import/mockex/journal/trades-2023.journal)
# hledger (for 2023-balance-sheet.txt)
# hledger (for 2023-all.journal)
# hledger (for 2023-cash-flow.txt)
# hledger (for 2023-income-expenses.txt)
Build completed in 0.09s

This will fail if you have any unbalanced transactions according to hledger’s version of standard double-entry accounting rules. If you’ve ever worked with Haskell this will be a similar experience: the compiler complains over and over until every little thing is fixed in your CSV rules, then suddenly it all works and magically writes a consistent history to 2023-all.journal. Cool, right?

2023-01-01 opening balances
    assets:exchanges:mockex                 = 100.00 USD 
    equity:opening balances

2023-02-01 (078blk23598s) MockEx buy 
    assets:exchanges:mockex      0.0100 BTC 
    assets:exchanges:mockex       -35.0 USD 
    expenses:fees                0.0001 BTC 

2023-05-01 (078werfgsdaf) MockEx buy 
    assets:exchanges:mockex      0.0100 BTC 
    assets:exchanges:mockex       -45.0 USD 
    expenses:fees                0.0001 BTC 

The rest of the exported files are standard financial reports. For example a balance sheet:

Balance Sheet 2023-05-01

                         ||            2023-05-01 
 Assets                  ||    
 assets:exchanges:mockex || 0.0200 BTC, 20.00 USD 
                         || 0.0200 BTC, 20.00 USD 
 Liabilities             ||    
 Net:                    || 0.0200 BTC, 20.00 USD 

You should version control all of them so you can diff them later! One of the main benefits of this system is being able to refactor aggressively and see what changed. Try messing up the sign of a number or the name of a field in mockex.rules and re-running it. You should either get an Hledger error about improper transactions or it will succeed and you can git diff your final reports. One way to get a good diff would be to change the name of the assets:exchanges:mockex account everywhere.

Finally, play around with some interactive hledger commands:

[nix-shell]$ hledger -f all.journal reg cur:USD assets
2023-01-01 opening balances  assets:exchanges:mockex  100.00 USD  100.00 USD
2023-02-01 MockEx buy        assets:exchanges:mockex   -35.0 USD   65.00 USD
2023-05-01 MockEx buy        assets:exchanges:mockex   -45.0 USD   20.00 USD
[nix-shell]$ hledger -f all.journal reg cur:BTC
2023-02-01 MockEx buy  assets:exchanges:mockex  0.0100 BTC  0.0100 BTC
                       expenses:fees            0.0001 BTC  0.0101 BTC
2023-05-01 MockEx buy  assets:exchanges:mockex  0.0100 BTC  0.0201 BTC
                       expenses:fees            0.0001 BTC  0.0202 BTC
[nix-shell]$ hledger -f all.journal bal assets
          0.0200 BTC
           20.00 USD  assets:exchanges:mockex
          0.0200 BTC
           20.00 USD  

Your finances will end up being too complicated for any one command to give a good overview, but you can do lots of small checks to build confidence that particular things are going well, then codify them as new report files to check for regressions. Towards eventual consistency!

Commit to learning it?

If you’re still interested at this point, take a break to let things sink in (really!), then work through at least part of the “full-fledged hledger” series + do whatever other research seems important. Decide whether you would feel comfortable committing a bunch of time and energy to this stuff.

Speaking of which, the one major disadvantage to hledger (vs ledger or beancount) as I see it is the lack of built-in capital gains handling. There are hacky ways to work around that—I’ll do a post about my solution soon—but it’s something to be aware of from the beginning. I decided the tutorial + CSV parsing infrastructure makes it worth using anyway.

The general idea of plain text accounting is sound, so I think it makes sense to commit to learning it but only provisionally use a specific tool. The journal formats are mostly compatible, minus a few edge cases like capital gains lot handling. So just get started! Once you have your data collected + formatted and a good pipeline structure, changing a few of the commands it invokes isn’t as big a deal as it probably sounds now.