process csv in scala process csv in scala sqlite sqlite

process csv in scala


If you have a simple CSV file, an alternative would be not to use any CSV library at all, but just simply parse it in Scala, for example:

case class Stock(line: String) {  val data = line.split(",")  val date = data(0)  val open = data(1).toDouble  val high = data(2).toDouble  val low = data(3).toDouble  val close = data(4).toDouble  val volume = data(5).toDouble  val adjClose = data(6).toDouble  def price: Double = low}scala> import scala.io._scala> Source.fromFile("stock.csv") getLines() map (l => Stock(l))res0: Iterator[Stock] = non-empty iteratorscala> res0.toSeq  res1: Seq[Stock] = List(Stock(2010-03-15,37.90,38.04,37.42,37.64,941500,37.64), Stock(2010-03-12,38.00,38.08,37.66,37.89,834800,37.89) //etc...

Which would have the advantage that you can use the full Scala collection API.

If you prefer to use parser combinators, there's also an example of a csv parser combinator on github.


The if statement after the while is useless--you've already made sure that aLine is not null.

Also, I don't know exactly what the contents of aLine is, but you probably want to do something like

aLine.zipWithIndex.foreach(i => prep.setString(i._2+1 , i._1))

instead of counting up by hand from 1 to 7. Or alternatively, you can

for (i <- 1 to 7) { prep.setString(i, aLine(i)) }

If you felt adopting a more functional style, you could probably replace the while with

Iterator.continually(reader.readNext()).takeWhile(_!=null).foreach(aLine => {  // Body of while goes here}

(and also remove the var aLine). But using the while is fine. One could also refactor to avoid the lastSymbol (e.g. by using a recursive def), but I'm not really sure that's worth it.


If you want to parse it in Scala, the built in parsers are quite powerful, and once you get the hang of it, pretty easy. I'm no expert, but with a few spec tests, this proved to be functional:

object CSVParser extends RegexParsers {  def apply(f: java.io.File): Iterator[List[String]] = io.Source.fromFile(f).getLines().map(apply(_))  def apply(s: String): List[String] = parseAll(fromCsv, s) match {    case Success(result, _) => result    case failure: NoSuccess => {throw new Exception("Parse Failed")}  }  def fromCsv:Parser[List[String]] = rep1(mainToken) ^^ {case x => x}  def mainToken = (doubleQuotedTerm | singleQuotedTerm | unquotedTerm) <~ ",?".r ^^ {case a => a}  def doubleQuotedTerm: Parser[String] = "\"" ~> "[^\"]+".r <~ "\"" ^^ {case a => (""/:a)(_+_)}  def singleQuotedTerm = "'" ~> "[^']+".r <~ "'" ^^ {case a => (""/:a)(_+_)}  def unquotedTerm = "[^,]+".r ^^ {case a => (""/:a)(_+_)}  override def skipWhitespace = false}

It's not what I would consider a feature-complete solution perhaps, I'm not how it would handle UTF-8 etc, but it seems to work for ASCII CSVs that have quotes at least.