Thursday, May 28, 2015

Today I learned... Windows batch loop for all files

Here is a one-liner that can be used in a Windows batch script to run a command (in this case, the GraphViz dot.exe application) once for each file in a directory (with a globbed filter).

The syntax is kind of ugly, but %%f is the filename and %%~nf is the filename without the extension.

FOR %%f IN (*.gv) DO dot -Tpng ".\%%f" -o ".\%%~nf.png"

This converts all GraphViz DOT files in a directory into PNG format.

More examples and documentation here: http://academic.evergreen.edu/projects/biophysics/technotes/program/batch.htm

Today I learned... no @SuppressWarnings in Scala!

Today I learned something annoying... Scala does not have any way to suppress warnings on individual code elements.

In Java you can use the @SuppressWarnings attribute on a code element to suppress compiler warnings for that element. .NET has something similar with the [SuppressMessage] attribute. The point of it is if you've decided the compiler warning is not a problem in your scenario, you don't want to clutter up your compiler output. Ideally, you want to be able to treat warnings as errors and fail a build if you get a single warning -- so this prevents that if there are any legitimate warnings that you're unable to suppress.

For example, if you need to use a deprecated method in some third-party API, you might have done the research and gotten the code reviewed and decided it was ok. An example that I came across today was connectionCachingEnabled in OracleDataSource. (Fortunately, I ended up not needing to use it because I realized the default value is already "disabled" which is what I wanted anyway.)

I don't have a solution, unfortunately... Just an observation, and one that others have noted as well (and the Scala community has closed this feature request with "wontfix" -- see http://stackoverflow.com/questions/3506370/is-there-an-equivalent-to-suppresswarnings-in-scala).

Thursday, May 14, 2015

Today I learned... basic authenticated HTTP connections in Scala

I needed to connect to Jenkins to install (on a remote server over SSH using JSch) the latest stable trunk build of a particular project if it was newer than the version that was already installed.

I spent a lot of time fighting with various Java/Scala APIs to do HTTP connections with authentication, and many of them didn't work as I needed and/or were too complicated to use.

I also spent a lot of time trying out existing Java APIs and StackOverflow posts for how to interact with Jenkins. I discovered through trial and error that the RisingOak Jenkins API simply does not work to get a nested job.

I finally got something to work. I'm using it in this manner:


  • Read contents of [jenkinsServerUrl]/job/JobName[/job/SubJobName]*/lastStableBuild/buildNumber to determine the last stable build number
  • Compare to build number of known installation. If greater, ensure it's a stable trunk build (if it's the same as lastStableBuild then skip checking if stable).
    • To check if a trunk build, parsing build parameter "branch" from the page [buildNumber]/injectedEnvVars under my Jenkins job sub URL
    • To check if a stable build, parse the HTML of the [buildNumber] page under the Jenkins job sub URL, and check the status icon (this is brittle, but was the only/easiest way I could find).
  • If not stable trunk build, decrement number until it's equal to the installed build, then stop.
  • If later build found, download the archive, SFTP it up, and install it using SSH commands.


Here's the helper object that I use to make the authenticated HTTP requests:

package Helpers

import java.io.InputStream
import java.net.URL
import Helpers.InputStreamHelper.InputStreamExtensions
import org.apache.commons.codec.binary.Base64
import scala.io.Source

/** * HTTP Helper methods */object HttpHelper {
  def download(url: String, savePath: String, user: String = null, passwordOrToken: String = null, requestProperties: Map[String, String]= Map("User-Agent"->"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)")) : Unit = {
    val inputStream = getInputStreamFromUrl(url, user, passwordOrToken, requestProperties)
    try {
      inputStream.downloadToFile(savePath)
    } finally {
      inputStream.close()
    }
  }

  def getPageContentFromUrl(url: String, user: String = null, passwordOrToken: String = null, requestProperties: Map[String, String]= Map("User-Agent"->"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)")) : String = {
    val inputStream = getInputStreamFromUrl(url, user, passwordOrToken, requestProperties)
    try {
      Source.fromInputStream(inputStream).getLines().mkString("\n")
    } finally {
      inputStream.close()
    }
  }

  def getInputStreamFromUrl(url: String, user: String = null, passwordOrToken: String = null, requestProperties: Map[String, String]= Map("User-Agent"->"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)")) : InputStream = {
    val connection = new URL(url).openConnection
    requestProperties.foreach({
      case (name, value) => connection.setRequestProperty(name, value)
    })

    if (user != null && passwordOrToken != null) {
      connection.setRequestProperty("Authorization", getHeader(user, passwordOrToken))
    }

    connection.getInputStream
  }

  def encodeCredentials(username: String, password: String): String = {
    new String(Base64.encodeBase64String((username + ":" + password).getBytes))
  }

  def getHeader(username: String, password: String): String = {
    "Basic " + encodeCredentials(username, password)
  }
}


package Helpers

import java.io.{FileOutputStream, InputStream}

/** * Created by Samer Adra on 5/13/2015. */object InputStreamHelper {
  implicit class InputStreamExtensions(val inputStream: InputStream) {
    def downloadToFile(path: String): Unit = {
      val buffer = new Array[Byte](8 * 1024)

      val outStream = new FileOutputStream(path)
      try {
        var bytesRead = 0        while ({bytesRead = inputStream.read(buffer); bytesRead != -1}) {
          outStream.write(buffer, 0, bytesRead)
        }
      } finally {
        outStream.close()
      }
    }
  }
}

Wednesday, May 6, 2015

Yesterday I also learned... Functional exception handling and sequence to varargs

Got this working at the end of the day yesterday and didn't get a chance to post it until now:

I experienced a network glitch doing an SFTP put, so I wanted to implement a retry wrapper that I could pass any code block to, along with the number of times to try before failing.
I wanted to pass in the types of exceptions to catch in the tries.
When I tried implementing it and passing in a variable type to the standard try/catch{case} statement (each case specifies what type(s) of exception to catch), it failed to compile.

There is an alternative way to catch exceptions in Scala -- the scala.util.control.Exception.handling() method.

handling(exceptionType1, exceptionType2, ...)
.by(ex => handleBlock)
.apply({
codeBlockToTry
})
}

In this way I was able to pass in a sequence of exception classes, cast it to _* to convert it to a variable-length parameters format (e.g., like args), and only catch the necessary types.

The parameter in my method that specifies the exception classes has a default value of a single-eleemnt sequence containing RuntimeException, and there is a type constraint indicating that all classes in the sequence must implement Throwable.

The retry method is a curried function, e.g., f(params1)(params2). The first sequence of parameters is applied, yielding a function against the second sequence of parameters.

The code is below:

import scala.util.control.Exception._
/** * Created by Samer Adra on 5/5/2015. */object RetryHelper {
/** * Retry any block of code up to a max number of times, optionally specifying the type of exception to retry. * @param maxTries Number of times to try before throwing the exception. * @param exceptionTypesToRetry Types of exception to retry. Defaults to single-element sequence containing classOf[RuntimeException] * @param codeToRetry Block of code to try * @tparam T Return type of block of code to try * @return Return value of block of code to try (else exception will be thrown if it failed all tries) */ def retry[T](maxTries: Int, exceptionTypesToRetry: Seq[Class[_ <: Throwable]] = Seq(classOf[RuntimeException]))(codeToRetry: => T): T = {
var result: Option[T] = None
var left = maxTries
while(!result.isDefined) {
left = left - 1
// try/catch{case...} doesn't seem to support dynamic exception types, so using handling block instead.
handling(exceptionTypesToRetry:_*)
.by(ex => if (left <= 0) throw ex).apply({
println("Left: " + left)
result = Some(codeToRetry)
})
}
result.get
}
}

Tuesday, May 5, 2015

Today I learned... DestroyJavaVM can cause JVM apps to hang instead of terminate

I have a long-running application that does some network IO (Oracle and SSH). It was taking longer than expected and it looked like it was done based on the logs. When I paused it to investigate, there was nothing active on the main thread, and all the threads were waiting.

I noticed DestroyJavaVM() in one of the threads, which I had never heard of (being new to JVM-land), and I also saw another thread that was stuck at an Oracle wait method, and a few other wait threads.

It turns out I forgot to dispose a couple of my resources before closing my application. When I fixed that, my application started terminating properly again.

Coming from C#, I've been spoiled by the using() block, which neatly cleans up any IDisposables. Really it's syntactic sugar for try{//do work}finally{disposableObject.Dispose()}.

In Scala I do the try/finally. I have read that there are ways to do something similar to C#'s using(), but they didn't seem as clean or intuitive to me, plus they'd need third-party libraries.

override def dispose(): Unit = { super.dispose() if (_dbConnection != null && !_dbConnection.isClosed) {
  _dbConnection.close()
}
_dbConnection = null 


I call this dispose() method in a finally block.

Monday, May 4, 2015

Today I learned... Using Java inner classes in Scala

I needed to copy some files between two remote servers. The source and destination are both HDFS so I would like to use DistCp, but right now the servers can't even ping each other so until I get that resolved I have to go through my local computer first.

I used Jsch (with a local private key specified) to establish SSH connections to each server.
I get the files from Hadoop onto the filesystem of the source server using hadoop fs -copyToLocal.
Next I SFTP the files to my local computer.
Next I SFTP the files to the destination server.
Finally I put the files in HDFS using hdfs dfs -put.

To do the SFTP to local I use the ls method of ChannelSftp (which is part of the Jsch library).
This returns a Vector of Objects, but all the objects are really of type LsEntry (so why didn't they just make it generic?). LsEntry is an inner class inside the ChannelSftp class.

My application is written in Scala, which is a JVM language, so Scala and Java code can reference each other in the same way that C# code and VB.NET code can use each other in Microsoft-land.

The problem was I couldn't cast those objects to LsEntry because the compiler kept complaining that com.jcraft.jsch.ChannelSftp.LsEntry was not valid. The funny thing is, when I would just type obj.asInstanceOf[LsEntry], the IDE knew what I was referring to because the tooltip asked if I was trying to use com.jcraft.jsch.ChannelSftp.LsEntry. When I said ok, it auto-added the import statement import com.jcraft.jsh.ChannelSftp.LsEntry, and then gave a "not found" error for the import it just added.

After much head-banging (of the against-the-wall sort, not the heavy-metal rocker sort), I found out that there is no way to include a Java inner class in a Scala import statement.

For Scala code to reference a Java inner class, the only way is to use the hash symbol. What finally worked was:

obj.asInstanceOf[ChannelSftp#LsEntry].

Apparently the reason for this is, whereas Java treats an inner class as part of the enclosing class, Scala treats it as part of an object of the enclosing class. So by default, if you declare two enclosing objects of the same type, then declare in each an object of the inner class, those two inner objects will not have the same type, because they will be under different enclosing objects. The # allows you to get an inner class defined under the class, as Java does.

I might actually use this property to restructure my program... Instead of having Graph and Node at the same level, I might make Node an inner class of Graph, thus guaranteeing that all Node operations happen in the same Graph (e.g., a Node addChild method that takes in another node would only accept another node in the same graph).

Changing this blog up a little

"Brevity is the soul of wit." So I'm going to start doing something a bit different... Instead of these long-winded posts that no one will ever read, I'll try to post more regularly and will hopefully just make short posts about something I learned (starting... next post).

This post I just want to mention that I've been working at a new job since January. I'm currently a software development engineer in test (SDET) working at FINRA.

I'm working in JVM land before, which I haven't really done since college (except for a Java-in-Visual Studio MSBuild template I built a few years ago, which hardly counts).

My project since joining FINRA has been Big Data test data generation and automated testing. I have engineered a system written in Scala that generates both the inputs and the expected outputs to a Big Data application called OATS, which takes in financial orders from broker-dealers, exchanges, etc., and links them all together so you can see the entire lifecycle of an order. I've modeled the states and their allowed transitions, and based on that am able to randomly generate data that I could expect the application would link together. I'm also leveraging this generation engine to assist us in testing of specific scenarios, which is done either by hardcoding a graph in Scala or my specifying a directory of DOT files describing a graph. The generator reads in the DOT files, transforms them into graphs of events satisfying each DOT file, fills in random data, and performs the automated testing using that input data and the generated expected outputs.

It's a lot of fun, and I've been learning a lot doing it.