process csv in scala
If you have a simple CSV file, an alternative would be not to use any CSV library at all, but just simply parse it in Scala, for example:
case class Stock(line: String) { val data = line.split(",") val date = data(0) val open = data(1).toDouble val high = data(2).toDouble val low = data(3).toDouble val close = data(4).toDouble val volume = data(5).toDouble val adjClose = data(6).toDouble def price: Double = low}scala> import scala.io._scala> Source.fromFile("stock.csv") getLines() map (l => Stock(l))res0: Iterator[Stock] = non-empty iteratorscala> res0.toSeq res1: Seq[Stock] = List(Stock(2010-03-15,37.90,38.04,37.42,37.64,941500,37.64), Stock(2010-03-12,38.00,38.08,37.66,37.89,834800,37.89) //etc...
Which would have the advantage that you can use the full Scala collection API.
If you prefer to use parser combinators, there's also an example of a csv parser combinator on github.
The if
statement after the while
is useless--you've already made sure that aLine
is not null.
Also, I don't know exactly what the contents of aLine
is, but you probably want to do something like
aLine.zipWithIndex.foreach(i => prep.setString(i._2+1 , i._1))
instead of counting up by hand from 1 to 7. Or alternatively, you can
for (i <- 1 to 7) { prep.setString(i, aLine(i)) }
If you felt adopting a more functional style, you could probably replace the while with
Iterator.continually(reader.readNext()).takeWhile(_!=null).foreach(aLine => { // Body of while goes here}
(and also remove the var aLine). But using the while is fine. One could also refactor to avoid the lastSymbol (e.g. by using a recursive def), but I'm not really sure that's worth it.
If you want to parse it in Scala, the built in parsers are quite powerful, and once you get the hang of it, pretty easy. I'm no expert, but with a few spec tests, this proved to be functional:
object CSVParser extends RegexParsers { def apply(f: java.io.File): Iterator[List[String]] = io.Source.fromFile(f).getLines().map(apply(_)) def apply(s: String): List[String] = parseAll(fromCsv, s) match { case Success(result, _) => result case failure: NoSuccess => {throw new Exception("Parse Failed")} } def fromCsv:Parser[List[String]] = rep1(mainToken) ^^ {case x => x} def mainToken = (doubleQuotedTerm | singleQuotedTerm | unquotedTerm) <~ ",?".r ^^ {case a => a} def doubleQuotedTerm: Parser[String] = "\"" ~> "[^\"]+".r <~ "\"" ^^ {case a => (""/:a)(_+_)} def singleQuotedTerm = "'" ~> "[^']+".r <~ "'" ^^ {case a => (""/:a)(_+_)} def unquotedTerm = "[^,]+".r ^^ {case a => (""/:a)(_+_)} override def skipWhitespace = false}
It's not what I would consider a feature-complete solution perhaps, I'm not how it would handle UTF-8 etc, but it seems to work for ASCII CSVs that have quotes at least.