Implicit classes / implicit conversions:
Suppose you have a Spark RDD and you want to call rdd.groupByKey()… It will fail to compile because the RDD class does not have a groupByKey() method.
Add an import to org.apache.spark.rdd.PairRDDFunctions, and it will magically compile.
PairRDDFunctions is a class that defines extra functions you can use on an RDD. The class definition for PairRDDFuncitons is honestly rather confusing… But here is a simpler example that shows how implicit classes can add additional functionality to a class:
After adding an import of StringExtensions, you can now write code like “12345”.isNumeric and it will compile even though that is not a method in java.lang.String. This is similar to C#’s extension methods.
Another handy use of this that you may come across is Java/Scala interoperability, to convert between the Java and Scala collection types.
Let’s say we’re in Java 7 (no streams, boo hoo) and want to do some functional programming on a java.util.List.
Maybe it’s a List<String> and we want to find all the numeric strings and do some crazy map, like for each string, get a tuple of the reversed string and the summation of the digits in the string (not sure why we would ever need to do this). Then we will return it back as a list of the numbers.
If we import scala.collection.JavaConverters._, then we can do this in one (long) line:
Here, asScala is an implicit conversion from a Java List to a Scala List, isNumeric is the implicit method we defined above, and reverse is defined in java.lang.String.
The map function turns the string into a 2-tuple of the reversed string and the summation of the digits in the string (done using foldLeft, but not the only way to do it).
The map returns a Scala sequence, so .asJava converts it back to a Java List.
If you import scala.collection.JavaConversions._ instead (-ions instead of -ers), then you can remove the .asScala and .asJava, and it will still work, but doing the conversions explicitly improves readability.
Implicit parameters
Looking in the Spark documentation, it says you can define an accumulator of type double, given a Spark context, by doing context.accumulator(0.0).
But this does not work unless you add an import to org.apache.SparkContext._ -- this is because, although the SparkContext class does have an accumulator method, it takes in additional parameters, which are marked as implicit. The SparkContext object defines implicit objects of the types needed by the accumulator functions. So once you import those and get them in scope, you can call accumulator with a single parameter, and it will fill in the additional parameter(s) based on the implicit variables in scope.
Hey, wow, small world. Never thought I'd hear of another Scala programmer in Westminster. I live right down the road not far from the Reese fire hall.
ReplyDeleteIf you want to grab a coffee in town and talk code sometime shoot me an email at craigwblake @ the gmail.
PS. You've got some nice posts here!