Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
I'm using scala professionally at this point, so I thought we should have a thread for it and the stupid questions I constantly have.

What is scala?

quote:

Scala is an acronym for “Scalable Language”. This means that Scala grows with you. You can play with it by typing one-line expressions and observing the results. But you can also rely on it for large mission critical systems, as many companies, including Twitter, LinkedIn, or Intel do.

Scala is a type-safe functional language that runs on top of the JVM and so can access the full range of existing Java libraries. Scala calls itself object-functional: it supports common functional programming constructs (functions that take or return functions, monads, efficient recursion, immutable types, etc.) while also having a strong type system with modern OO features. Functions are themselves objects, in that:

code:
def timesTwo(x: Int): Int = x * 2
val four: Int = timesTwo(2)
Is really syntactic sugar for:

code:
object timesTwo {
  def apply(x: Int): Int = x * 2
}
val four: Int = timesTwo(2)
Who cares?

Where scala seems to really excel is in distributed environments (where immutability prevents race conditions rather than just annoying programmers) and in asynchronous environments where functional constructs like Future allow you to build non-blocking systems without needing to explicitly declare everything a callback method and just let the compiler figure it all out.

For example:

code:
def transformDatabaseRow(id: Int, f: (RawRow) => ProcessedRow): Future[ProcessedRow] = {
  val row: Future[RawRow] = getFromDatabase(id)
  row.map(f)
}
This is a function that takes as arguments an id and a function f, retrieves the row for id from the database, and applies the function to the row. However, as written transformDatabaseRow returns immediately a value that will at some point hold the processed data, the passed function isn't called yet, but will be at some point after the data becomes available. Future is a container class (a monad) for data that will exist at some point and map tells the Future to apply f to the Future's contents once they become available. Your code (f) doesn't need to care about the asynchronous nature of its application at all.

Places to learn about scala:


Other resources and major libraries:

  • Spark -- distributed HPC environment (like Hadoop but not terrible)
  • Play framework -- Reactive web framework
  • Akka Actors -- concurrent, asynchronous, and distributed messaging system
  • Slick -- Functional relationship mapping -- synchronous database access for a functional language

KernelSlanders fucked around with this message at 23:54 on Jan 10, 2015

Adbot
ADBOT LOVES YOU

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

PlesantDilemma posted:

Scala seems pretty cool. I'm getting into more Java these days and out of the php slums, Scala looks like it could be a fun next language to tackle.

KernelSlanders, what do you use Scala for at your work? How much Java did you know before you got into Scala?

As an outsider, here is my preconceptions of Scala:
  • popular for backend things
  • somewhat popular for web things
  • people complain that the language is too complex
  • I get confused every time I see val four: Int because I want to see the type come before the variable name

As Fullets suggested, I use it for bigdataish tasks predominately: spark jobs that do product or user clustering, large scale ML, some NLP stuff, etc. It's basically used to productize data science tasks that run through cron. We also use it for internal microservice components of our web service. I've also just started using it in script mode for one-off jobs and data exploration that I used to do in python/numpy/pandas, although that's mostly to take advantage of other scala libraries we've written.

I haven't done much Java since school, although I did do some C#, which is syntactically quite similar. I fall into the like it a lot more than Java camp. I would say that it's not so much the language is too complex as that the syntax can be cryptic. I think in general it can be pretty expressive if you want it to be, although there are clearly some cases where syntax that shouldn't be needed by the compiler is. I think context bounds are a good example of that.

The type-after-variable-name syntax doesn't really bother me. I think of it as a decorator since most of the time the compiler can figure out the type implicitly and you can just leave the declaration off. Python 3 uses the same syntax for type hinting.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Cryolite posted:

I'm learning Scala coming from C# and this is exactly the type of stuff I'd like to eventually be doing too, bonus points if it's in Scala. Is your company by any chance in the Baltimore/DC area? :allears:

I really, really want to jump ship off of .NET and maybe into a position writing Scala next, but it looks like there aren't too many companies looking for it (or at least near me).

NYC, sorry.

I think if you want a non .NET language to break into big data projects with you should probably start with Python, to be honest. Although if you want to stay in the Baltimore/DC/Nova area you should probably look up what BAH uses. You can certainly do a lot worse than scala and I don't agree with triple sulk's suggestion of Haskel or Clojure, but he's right that very few companies are actively hiring scala developers at this point. In fact, we use "get to learn scala" as a recruiting tool since there's a perception that it may be the Next Big Thing. We've recruited a couple experienced scala devs, but one was one of the early contributors to some core language features. That said, scala will get attention on a resume, at least from us.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
A common programming task I run into at work is needing to process all the elements of a list into a reduced list of fewer elements such as converting web log events to sessions. We've been debating the merits of two different design patterns and wanted to see what others thought. As an example I'll use taking a string and converting it to as set of tuples that count the repeated occurrences of a character in a given run. So "foocookie!!!" becomes List((F,1), (o,2), (c,1), (o,2), (k,1), (i,1), (e,1), (s,1), (!,3)).

The first pattern is a tail recursive method that takes two arguments: first the list of already processed elements, and second the list of elements yet to be processed.

code:
def countRuns(str: String): List[(Char, Int)] = {
  def countRunsImpl(runs: List[(Char, Int)], letters: List[Char]): List[(Char, Int)] = {
    if (letters.isEmpty) runs
    else if (runs.head._1 == letters.head) countRunsImpl((runs.head._1, 
      runs.head._2 + 1) +: runs.tail, letters.tail)
    else countRunsImpl((letters.head, 1) +: runs, letters.tail)
  }

  val letters = str.toCharArray.toList
  countRunsImpl(List((letters.head, 1)), letters.tail).reverse
}
The other method is to use foldLeft with a method that has the already processed elements, and the next element to process.

code:
def countRuns(str: String): List[(Char, Int)] = {
  def countRunsImpl(runs: List[(Char, Int)], letter: Char): List[(Char, Int)] = {
    if (runs.head._1 == letter) (runs.head._1, runs.head._2 + 1) +: runs.tail
    else (letter, 1) +: runs
  }

  val letters = str.toCharArray.toList
  letters.tail.foldLeft(List((letters.head, 1)))(countRunsImpl).reverse
}
I don't think there's any performance reason to favor one over the other, it's mostly about readability. Which do you think is clearer? Is there a third pattern I'm not seeing?

e: my lines were too long or something

KernelSlanders fucked around with this message at 21:41 on Jan 31, 2015

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
Yeah, foldRight is probably cleaner than foldLeft. I purposely didn't use an anonymous function because in reality the logic of the Implementation method would be substantially more complex.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
That doesn't really work though since foldRight isn't tail recursive (See: LinearSeqOptimized.scala) and using it on a stream will cause the whole stream to be read into memory. Worse yet it will do so one element per stack frame resulting in a stack overflow if the list is sufficiently large.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Volguus posted:

Yea, have fun googling when you have a problem :)

I was trying to see today if there's a subtle difference between ++ and ::: on List today. Google doesn't like ::: as a search query it seems.

That said, being able to write val list = item1 :: item2 :: item3 :: subList1 ++ subList2 is pretty convenient.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

sink posted:

I get the sense that corporations are making more strong bets on the JVM than the CLR. I don't have numbers to prove that, but this is my reflection from job hunting in NYC and SF. Although I do run into a a surprising number of .NET shops here and there. Stackoverflow and Zocdoc being two headliners. Do you know if they are jumping into F#?

My experience has been the same. I think generally startups want to avoid the Microsoft stack since it's considerably more cost up front. An m3 medium instance on AWS running the windows stack is three times the cost of one running linux. Also linux/akka-http/postgres is only one step removed form linux/django/postgres making it somewhat of an easier migration to dabble in it, for example, for micro-services where CLR requires a completely new stack.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
Slick was a long series of headaches for us and we ended up just rolling our own.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
Depending on the complexity of your project, here's something to get you started if you change your mind.

code:
case class Foo(name: String, value: Int)

object Foo {

  fetchFromDb(conn: Connection): List[Foo] = {
    val sql = "SELECT name, value FROM foo_table"
    val rs = conn.prepareStatement(sql).executeQuery()

    val output: List[Foo] = List[Foo]()
    while (rs.next()) output = output :+ {
      val name = rs.getString("name")
      val value = rs.getInt("value")
      Foo(name, value)
    }

    output
  }
}
Obviously there's numerous improvements that could be made: abstract the column fetchers, return a stream instead of a list, support nulls through Option outputs, etc. but if you're just trying to get some data out, this should get you started.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
Basically, you need to use MongoDB if you want true reactive DB access.

The other option is to wrap your DB access in a Finagle service and then access that in an asynchronous way, although that's really just hiding the problem.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
Spark 1.3 supposedly has an R or Pandas like dataframe. Has anyone used it yet?

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Steve French posted:

If my situation is unique to the Bay Area...

It's not. It is somewhat unique to tech companies though. When an airline hires programmers they tend to do it through an HR department filled with people who print out forms to fax to people and they like simple rules like "ten years of python experience."

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

sink posted:

Virtually every operator has an alpha alias. I'm inclined to agree with you in that I don't particularly care for symbolic soup, but I think it's fairly subjective or contextual. I would rather see a method named ~> than naturalTransformation.

Why use two characters when you can use one like ↝?

def Σ(x: List[Int]): Int = (0 /: x) (_ + _)

KernelSlanders fucked around with this message at 05:15 on Mar 31, 2015

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
+ isn't really an operator in scala, just an infix notation method call. (_+_) is syntactic sugar for (a, b) => a.+(b)

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

jpotts posted:

It's the same where I work, but most of the OOP guys are slowly being converted to FP. I can appreciate both sides of the argument, but in our case the existing OOP implementations are horrible, multi-layered abstractions that are incredibly obtuse. That's not to say that OOP is horrible, but holy poo poo do I hate layers...and the cake pattern. :cripes:

Honestly, the myriad of styles possible in Scala is it's worst attribute. There are just too many ways to write some horribly confusing code.

Also, I wish Haskell people didn't talk so drat much about Haskell.

Cake pattern is terrible. I don't understand why anyone would chose it over constructor injection.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
I've been pretty happy with spray for my toy web apps although serving the original html is a bit clunky.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
Has anyone had any success using Spark's DataFrame object? Am I missing something or is the whole thing just a horrendously designed API that can't possibly be useful for anything? Like in pandas you can df['c'] = df['a'] + df['b']. Is there a simple way to do that in spark Dataframes? What about df['idx'] = df.id.map(lambda x: np.where(ids == x)[0][0])?

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
Generally, you don't want to user the Java library if there's a Scala library that does the job. The interfaces just tend to be clunky in Scala code. I'm not sure why you feel String.join(",", a) is cleaner than a.mkString(",") but if you really want to do so, you can always define your own function.

code:
scala> def join[T](list: ArrayBuffer[T], delim: String): String = list.mkString(delim)
join: [T](list: scala.collection.mutable.ArrayBuffer[T], delim: String)String

scala> join(a, ",")
res1: String = one,two,three
I would though, argue for a.mkString(",") as a much more Scala-like way of expressing the idea. We love calling methods on collections. You could imagine a def mkString(d: String) = this.tail.foldLeft(this.head){ (l, r) => l + d + r.toString }

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
Stream is an attempt at a functional approach to something like the C# yield return although it's probably closer to a generator comprehension in python. That's probably the closest to idiomatic of the approaches you listed. Does this example make more sense?

code:
val integers: Stream[Int] = 0 #:: integers.map(_ + 1)
Basically, it's a lazy evaluated list. The stream integers is defined (recursively) as the element 0 followed by all the elements of integers plus one. So the zeroth element is zero (per the initial case), then the first element of integers (which is the zeroth element of the thing to the right of the #::) is the zeroth element of integers plus one.

By the way, your implementation of fib2 isn't great because that scanLeft will end up recomputing the entire sequence for every element. Try this to see why:

code:
def addThem(a: Int, b: Int): Int = {
  println(s"add> $a + $b = ${a+b}")
  a + b
}

def fib2(): Stream[Int] = 0 #:: fib2.scanLeft(1)(addThem)

fib2.take(10).foreach(println(_))
There's a better implementation in the Stream scaladocs. If it's an interview question though, you probably can't go wrong with the tail-recursive approach.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
Yeah, square brackets in scala are basically the same as angle brackets in java. asInstanceOf acts as a cast.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
Has anyone used Ammonite? What did you think of it?

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

KICK BAMA KICK posted:

I think the intentions of this are pretty clear; a case class (more just to convey the intention of immutability and for the free equals and hashCode than for pattern matching) that is only constructed from the companion's parameterless factories.

When I defined apply without the parentheses and tested it in the REPL, val f = Foo() threw an error because it couldn't resolve the ambiguity between the helper's apply method and the class's constructor. This doesn't make any sense to me, since a) if I don't supply an argument, isn't it obvious that I'm calling the one that doesn't take an argument? and b) that constructor is supposed to be private anyway.

So my questions are 1) why? 2) is that how you implement what I'm trying to do? and 3) I guess whatever the convention is on factory methods that take no arguments and otherwise have no side effects, would you define and call otherFactory with parentheses just to match the apply version? I know it doesn't make a difference but it would bug me.

It's hard to know precisely what's going on, but I suspect you overloaded a factory method with the same number of arguments as the one created for you with the case keyword. This code works fine for me:

code:
case class Foo private (name: String, value: Int) {
  def plusInt(n: Int): Int = value + n
}

object Foo {
  def apply(): Foo = new Foo("zero", 0)
}
Then:
code:
scala> val foo = Foo("three", 3)
foo: Foo = Foo(three,3)

scala> val foo = Foo()
foo: Foo = Foo(zero,0)
Don't forget that if you're using default arguments or the _* operator, then the case class syntactic sugar is defining more factory methods than you probably realize. Of course in my case Foo.zero would probably have been more idiomatic than Foo().

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

FamDav posted:

might be a coding horror, but is there a programmatic way to get the concrete parent class of an anonymous class?

i.e. given

val x = new Thing with SomeTraitAndNowImAnonymous

is there a method on x I can call that will return Thing?

When I first started learning scala I found myself constantly at war with the type system. I've learned to really appreciate it's power since. In general if you find yourself fighting the type checker, you made a mistake earlier on in your design. It's hard to know what exactly the best solution for you is without the use case (implicit ClassTags?). That said, here is almost certainly something better than java reflection, especially given that once your class goes in a container, type erasure breaks reflection.

Sedro posted:

Plain old Java reflection will work: x.getClass.getSuperclass

This works for the simple case FamDav described, but for an ad hoc anonymous class it's likely to be much less useful:

code:
scala> val x = new Iterator[Int] {
     |   var count = 0
     |   def hasNext = true
     |   def next = { count += 1; count }
     | }
x: Iterator[Int]{def count: Int; def count_=(x$1: Int): Unit} = non-empty iterator

scala> x.getClass
res0: Class[_ <: Iterator[Int]] = class $anon$1

scala> x.getClass.getSuperclass
warning: there were 1 feature warning(s); re-run with -feature for details
res1: Class[?0] forSome { type ?0 >: ?0; type ?0 <: Iterator[Int] } = class java.lang.Object

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
I commonly have to match on an Option of an instance of a sealed trait. It ends up being an awkward pattern and it really seems like there should be a cleaner way to do it. Supposed I have some code like:

code:
sealed trait Light
case class RedLight(timeLeft: Double) extends Light
case class GreenLight(timeLeft: Double) extends Light

def timeToNextGreen(light: Option[Light]) = light match {
  case Some(RedLight(t)) => t
  case Some(GreenLight(t)) => t + 5.0
  case None => 0.0
}
This nested unapply works perfectly fine. However, sometimes you don't want to run an extractor, you need the object itself. I end up doing something like:

code:
def doStuff(light: Option[Light]) = {
  val something = light.map {
    case rl: RedLight => doSomething(rl)
    case gl: GreenLight => doSomethingElse(gl)
  }

  something.getOrElse(thatOtherThing)
}
or

code:
def doStuff(light: Option[Light]) = light match {
  case Some(x) => 
    x match {
      case rl: RedLight => doSomething(rl)
      case gl: GreenLight => doSomethingElse(gl)
    }
  case None => thatOtherThing 
}
Neither of those are particularly satisfying.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
Ah of course. I've used @ but didn't realize you could nest it.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Hughlander posted:

Not sure about the Scala component but that also sounds like a prime Kafka use case.

Kafka's the queue, not a language in which to write the consumer. If the goal is to be scalable at some point, I'd vote Spark Streaming as the consumer framework.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

sink posted:

how embarrassing for them: https://www.lightbend.com/blog/typesafe-changes-name-to-lightbend

i understand this is about surviving as a company and thus catering to enterprise clients who are Java first, but stepping back from Scala is a little disheartening. the copy on that lovely website has also become completely impenetrable

This was always going to be a problem for them. The companies that hire consulting firms to build bespoke business software for them are not the ones pushing the adoption of new technologies.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
Scala School from Twitter is great, but getting quite dated now. There's a scala for java programmers tutorial on the scala-lang web site that's pretty good even if you aren't a java programmer.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
I would have:

code:
things
  .filter(_.someCondition)
  .map(_.toOtherThing)
Save infix for where it really adds clarity such as operators or DSLs.

Adbot
ADBOT LOVES YOU

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
Is baz a List[String] or is fun(q - foo) a List[String]?

  • Locked thread