Saturday, June 27, 2015

Today I learned... doing things the hard way when languages and OSes are glitchy

So I have some code that works just fine... most of the time. But there has been some strange stuff that doesn't make any sense and isn't easily reproducible.

Few examples I encountered recently were Scala's parallel HashMap and file renaming in Java.

Scala has a nice parallel HashMap that allows you to do gets, puts, and updates concurrently, and it handles the synchronization for you... except when it doesn't.

I put a tuple into a ParHashMap, which has no delete operations on it in my whole code, and later when I went to get that value, it wasn't there.

On another instance, when I went to put a tuple into a ParHashMap, I got an ArrayIndexOutOfBoundsException.

I replaced my ParHashMap with a single-threaded HashMap and did my own synchronization around it, and I haven't seen either problem since...


Another issue was renaming a file in Windows using Java. File.renameTo was working just fine, but all of a sudden I came across one occurrence where the file appeared to have been copied instead of renamed. So I changed my code to delete the source file after the rename, if it was still there. But then I got an exception that the source file was already in use, even though I had previously closed my writer that was operating on that file.

I ended up writing the following method to replace what should be a simple one-line file rename:

def rename(filePath: String, targetPath: String): Unit = {
    RetryHelper.retry(10, Seq(classOf[java.nio.file.FileSystemException])){
      val sourceFile = new File(filePath)
      val targetFile = new File(targetPath)
      if (!targetFile.exists) {
        Files.move(sourceFile.toPath, targetFile.toPath)
      }
      if (sourceFile.exists) {
        sourceFile.delete()
      }
    }(Thread.sleep(50))
}

That's right, the only way I could get it to work every single time, without fail (so far!) was to delete the source file after the "move" (in case it behaved as a copy for some reason), and to wrap that in a retry block that executes up to 10 times and catches FileSystemExceptions with a short sleep before retrying. Yikes!

For reference, here is my retry helper, which I discussed in a previous post but which has been revised since, as it now takes in a code block defining what to do in the exception handling block when catching one of the specified exceptions.

import scala.util.control.Exception._ // Motivation for this object is described in the following blog posts:// http://googlyadventures.blogspot.com/2015/05/what-else-i-learned-yesterday.html// http://googlyadventures.blogspot.com/2015/06/today-i-learned-doing-things-hard-way.html /** * Helper methods to retry code with configurable retry count and exceptions to handle */object RetryHelper { /** * Retry any block of code up to a max number of times, optionally specifying the types of exceptions to retry * and code to run when handling one of the retryable exception types. * * @param maxTries Number of times to try before throwing the exception. * @param exceptionTypesToRetry Types of exceptions to retry. Defaults to single-element sequence containing classOf[RuntimeException] * @param codeToRetry Block of code to try * @param handlingCode Block of code to run if there is a catchable exception * @tparam T Return type of block of code to try * @return Return value of block of code to try (else exception will be thrown if it failed all tries) */ def retry[T](maxTries: Int, exceptionTypesToRetry: Seq[Class[_ <: Throwable]] = Seq(classOf[RuntimeException])) (codeToRetry: => T) (handlingCode: Throwable => Unit = _ => ()): T = { var result: Option[T] = None var left = maxTries while (!result.isDefined) { left = left - 1 // try/catch{case...} doesn't seem to support dynamic exception types, so using handling block instead. handling(exceptionTypesToRetry: _*) .by(ex => { if (left <= 0) { throw ex } else { handlingCode(ex) } }).apply({ result = Option(codeToRetry) }) } result.get } }

Tuesday, June 9, 2015

Today I learned... JavaConverters vs. JavaConversions

I have some Scala code that interops with Java code. I had previously used

import scala.collection.JavaConversions._

to support auto-conversion of collections, allowing me to call a method expecting a Java collection by passing it a Scala collection, and the other way around.

Today I refactored my code so this conversion happens explicitly.

I got rid of the JavaConversions import and instead added this:

import scala.collection.JavaConverters._

Then whenever I need code that expects a Java collection but I have a Scala collection, I call myScalaCollection.asJava, and when I do it the other way around, I call myJavaCollection.asScala.

It's less verbose and perhaps less magical, but I think it's easier to read the code and see what is going on. I'm controlling the conversion instead of letting the compiler do it for me, so I can be aware of any performance issues, etc. of possibly converting something back and forth needlessly.

There are some limitations. When you have a Scala queue, for example, a scala.collection.immutable.Queue[T], calling the implicit method .asJava converts it to a java.util.List, which is not exactly a queue! If you want a Java Queue, you'd have to new java.util.LinkedList(myScalaQueue.asJava), because LinkedList implements the Java Queue interface and takes in a java.util.List as a constructor argument.

It is also possible to implement this yourself by creating your own implicit method, e.g.:

import scala.collection.JavaConverters._
import scala.collection.Seq
import java.util.LinkedList

object ScalaSeqHelper {
  implicit class SeqImplicits(seq: Seq) {
    def asJavaLinkedList: LinkedList = {
      new LinkedList(seq.asJava)
    }
  }
}


Then if you import ScalaSeqHelper._, you can call .asJavaLinkedList on any Scala Seq (which includes the mutable and immutable Queue implementations) and get a Java LinkedList, which is also a Java Queue.

Recently I learned... Covariance and mutability

Last week I made my data generation and automated testing framework more generic.
Instead of having an engine that processed a Graph of Nodes of MySpecificData, I created an abstract engine to process over a Graph of Nodes of GenericData.

So I had an abstract method such as process(graphToProcess: Graph[Node[GenericData]])
And I had an override method such as process(graphToProcess: Graph[Node[MySpecificData]])

The compiler won't let me do this because the methods don't have the same signature, even though MySpecificData "is a" GenericData.

If you've used Java, you may tried to cast an Array<Int> to an Array<Object>, and for Arrays in Java, it allows this. The problem is you can then insert a String into this Array and then when you try and read the String from the Array<Object>, which is really an Array<Int>, you'll get a ClassCastException.

Scala will prevent this with a compilation error. Because the contents of Array can change (it's mutable), Scala forbids you from casting it to a more generic type, because it can't guarantee it type-safe at compile-time.

To say "WrapperType[SpecificWrappedType] is a WrapperType[GenericWrappedType]" requires the generic type parameter of WrapperType to be covariant in the class definition of WrapperType. If WrapperType has any mutability in it, the compiler will prevent you from declaring the generic wrapped type as covariant.

It's not always desirable to make everything immutable, unless you're a diehard Haskell geek. Sometimes in production environments, we have deeply nested data structures for which creating a new copy every time we need to do something would be an unacceptable performance hit. That is the case with my Graph[Node[NodeData]]] -- I really just need it to be mutable.

One possibility is to define a secondary type as a subtype of the original generic type, and make that type covariant, and use it in your class, but it's not very clean and makes for strange code (and I couldn't get this solution to even compile in every single situation).

The easiest thing to do is to use an annotation that will tell the Scala compiler to ignore the error due to a generic type being used in the wrong context (e.g., a covariant type used in a mutable setting).

To define a generic type as mutable, use the plus sign prefix -- e.g., "class Blah[+T <: SomeAbstraction]" defines Blah wrapped over generic type T, which is covariant and implements another type called SomeAbstraction.

Try and use it, and if you get a compiler warning related to using a covariant type in an invariant or contravariant position, just add this import:

import scala.annotation.unchecked.{uncheckedVariance => uV}

And then just put @uV after any reference to the type (except for the definition).

E.g., a method definition might pass in parameters using this type:

def someMethod(myParam1: Seq[T @uV], myParam2: (T @uV))

(If the type is by itself, wrap parentheses around it and the annotation.

And that's how you can fool the Scala compiler into allowing you to write covariant, mutable code.

The type system is there for a reason, and strong typing can be a wonderful thing. In this case, there's simply no way to prove that our code is safe at compile-time and we have to relax the compiler's default restrictions. It would be easier if there were a compiler flag or a "begin/end" kind of annotation instead of annotating every offending usage, but at least there's a way around it.

Just be smart and don't write code that isn't typesafe. Just don't try to stuff a String into an Array<Int> and you should be fine. And test your code so that that ClassCastException will occur if you happen to write bad code.