Reading CSV, TSV, and invalid CSV files with Golang

CSV is still one of the most popular formats to organize table-like data. Golang is a powerful tool to process CSV given its performance and ease of use. Let’s see how to address the most common cases.

Reading CSV file

To read CSV files it’s recommended to use encoding/csv reader component. We’re going to use the following data.csv for examples:

cat data.csv
id,name,price
1,Phone,123
2,TV,34
3,Boot,5

We might want to read and process CSV files line by line in most cases to handle large files and never run out of memory:

package main

import (
  "encoding/csv"; "fmt"; "io"; "os"
)

func main() {
  f, _ := os.Open("data.csv")
  r := csv.NewReader(f)

  for {
    row, err := r.Read()

    if err == io.EOF {
      break
    }

    fmt.Println(row)
  }
}
[id name price]
[1 Phone 123]
[2 TV 34]
[3 Boot 5]

If we know we work with small CSV files, we can use ReadAll() method to read the entire file:

package main

import (
  "encoding/csv"; "fmt"; "os"
)

func main() {
  f, _ := os.Open("data.csv")
  r := csv.NewReader(f)

  rows, _ := r.ReadAll()
  fmt.Println(rows)
}
[[id name price] [1 Phone 123] [2 TV 34] [3 Boot 5]]

Reading TSV files and other custom delimiters

In some cases, CSV files are actually not comma-delimited (“C” comes for comma in “CSV”), but other symbols are used to separate columns. Use Comma property to define the delimiter in this case. Let’s read tab separated file (tabs are used for columns separation):

package main

import (
  "encoding/csv"; "fmt"; "io"; "os"
)

func main() {
  f, _ := os.Open("data.tsv")
  r := csv.NewReader(f)
  r.Comma = '\t'

  for {
    row, err := r.Read()

    if err == io.EOF {
      break
    }

    fmt.Println(row)
  }
}

Reading CSV with custom quoting symbols

Double quotes should be used to quote values in CSV files, but someone might have decided to use something else when creating CSV you have to deal with.

Unfortunately, encoding/csv component doesn’t support custom quotes. In such cases, we can use extra tools to reformat before we feed them to our program. Let’s take the following single-quoted CSV file as an example:

cat data-custom.csv
id,name,price
1,Phone,123
2,'TV, Screens',34
3,Boot,5

We can use python csvkit toolset to change quoting:

csvformat -q "'" data.csv > data-standard.csv

This will produce the following file:

id,name,price
1,Phone,123
2,"TV, Screens",34
3,Boot,5

As we can see, now we have double quotes and this file can be used with our Golang program.

Dealing with broken/invalid CSV files

Broken CSV file is a common case. Let’s try to handle the following broken CSV:

id,name,price
1,Phone,123
7,
2,TV, Screens,34
3,Boot,

While processing this file, encoding/csv component will throw errors on invalid rows which we catch and process in a way we want:

package main

import (
  "encoding/csv"; "fmt"; "io"; "os"
)

func main() {
  f, _ := os.Open("data.csv")
  r := csv.NewReader(f)

  for {
    row, err := r.Read()

    if err == io.EOF {
      break
    }

    if err != nil {
      fmt.Println(err)
      continue
    }

    fmt.Println(row)
  }
}
[id name price]
[1 Phone 123]
record on line 3: wrong number of fields
record on line 4: wrong number of fields
[3 Boot 5]

Another option is to use csvclean tool from csvkit toolset to filter invalid rows from the CSV file.

Published a year ago in #programming about #golang and #csv by Denys Golotiuk

Edit this article on Github
Denys Golotiuk in 2024

I'm a database architect, focused on OLAP databases for storing & processing large amounts of data in an efficient manner. Let me know if you have questions or need help.

Contact me via golotyuk@gmail.com or mrcrypster @ github