process csv in scala

java sqlite scala

If you have a simple CSV file, an alternative would be not to use any CSV library at all, but just simply parse it in Scala, for example:

case class Stock(line: String) {  val data = line.split(",")  val date = data(0)  val open = data(1).toDouble  val high = data(2).toDouble  val low = data(3).toDouble  val close = data(4).toDouble  val volume = data(5).toDouble  val adjClose = data(6).toDouble  def price: Double = low}scala> import scala.io._scala> Source.fromFile("stock.csv") getLines() map (l => Stock(l))res0: Iterator[Stock] = non-empty iteratorscala> res0.toSeq  res1: Seq[Stock] = List(Stock(2010-03-15,37.90,38.04,37.42,37.64,941500,37.64), Stock(2010-03-12,38.00,38.08,37.66,37.89,834800,37.89) //etc...

Which would have the advantage that you can use the full Scala collection API.

If you prefer to use parser combinators, there's also an example of a csv parser combinator on github.

java sqlite scala

The if statement after the while is useless--you've already made sure that aLine is not null.

Also, I don't know exactly what the contents of aLine is, but you probably want to do something like

aLine.zipWithIndex.foreach(i => prep.setString(i._2+1 , i._1))

instead of counting up by hand from 1 to 7. Or alternatively, you can

for (i <- 1 to 7) { prep.setString(i, aLine(i)) }

If you felt adopting a more functional style, you could probably replace the while with

Iterator.continually(reader.readNext()).takeWhile(_!=null).foreach(aLine => {  // Body of while goes here}

(and also remove the var aLine). But using the while is fine. One could also refactor to avoid the lastSymbol (e.g. by using a recursive def), but I'm not really sure that's worth it.

java sqlite scala

If you want to parse it in Scala, the built in parsers are quite powerful, and once you get the hang of it, pretty easy. I'm no expert, but with a few spec tests, this proved to be functional:

object CSVParser extends RegexParsers {  def apply(f: java.io.File): Iterator[List[String]] = io.Source.fromFile(f).getLines().map(apply(_))  def apply(s: String): List[String] = parseAll(fromCsv, s) match {    case Success(result, _) => result    case failure: NoSuccess => {throw new Exception("Parse Failed")}  }  def fromCsv:Parser[List[String]] = rep1(mainToken) ^^ {case x => x}  def mainToken = (doubleQuotedTerm | singleQuotedTerm | unquotedTerm) <~ ",?".r ^^ {case a => a}  def doubleQuotedTerm: Parser[String] = "\"" ~> "[^\"]+".r <~ "\"" ^^ {case a => (""/:a)(_+_)}  def singleQuotedTerm = "'" ~> "[^']+".r <~ "'" ^^ {case a => (""/:a)(_+_)}  def unquotedTerm = "[^,]+".r ^^ {case a => (""/:a)(_+_)}  override def skipWhitespace = false}

It's not what I would consider a feature-complete solution perhaps, I'm not how it would handle UTF-8 etc, but it seems to work for ASCII CSVs that have quotes at least.

CodeHunter

process csv in scala

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last