Tuesday, December 28, 2021

A bug is a missing test case

I came across this quote a few years ago:

"Treat bugs as proof of a missing test case. When a bug is found, figure out a way to implement a test case." -Jeremy Dunck*

 

What's the first thing you have to do to fix a bug? Reproduce it! And what's the last thing you have to do before you push the fix? Test it!

We can feed two birds with one scone by adopting the mantra "a bug is a missing test case."

Next time you have to fix a bug, write a test to reproduce it. Ideally this would be an automated unit test, but this advice is true regardless of what level your tests are at or whether or not they are automated. If a repository or file has zero unit tests, a bug fix is a great opportunity for you to write its first unit test.

 

Write this test case before you change any code to fix the bug. Follow the standard "red-green-refactor" steps: Based on the steps to reproduce the bug, write a test that fails (goes red), and run it to confirm the defect. Change the code to make the test pass (go green). Then refactor the code. Before you commit the fix, run regression tests to make sure you didn't break existing functionality.

 

Your new test should serve as a regression test to make sure that the fixed bug will never reoccur.

 

In addition to writing the test and fixing the bug, ask yourself how you could have prevented the bug in the first place. Was it an edge case or an exceptional scenario and an oversight when writing the initial tests? Did you think it wasn't risky enough to test? Was it too hard to test initially? Is the code base too complex, tightly coupled to dependencies, and untestable? Is there a coding lesson to be learned? Is there any process you could change to catch it next time? Could it have been caught in a review of either the code or the tests? Human error is rarely a satisfactory root cause. Don't keep these lessons to yourself.

 

Is it possible that similar bugs are lurking elsewhere? Do testing gaps exist? Will you find out before your users do?

 

Writing tests after you find a bug is a wonderful idea, but it's better if you can write tests and learn to improve your code so you can prevent bugs before they happen (see my previous post about building quality in). Even 100 percent code coverage is not a panacea -- you still need careful thought to come up with good test cases and do comprehensive assertions.

 

* I'm not sure the exact source of this quote. If anyone knows, please share! The above quote was the earliest version I found, from the old Joel on Software forum from 2004.

Build Quality In!

Here are some insightful quotes come from one of the 20th century's leading experts on management and quality, William Edwards Deming.

 

"Build quality in."

 

"Quality can not be inspected into a product or service; it must be built into it."

 

"Inspection does not improve the quality nor guarantee quality. Inspection is too late. The quality, good or bad, is already in the product."

 

This blog will focus on coding and testing, but these quotes apply more universally. For example: Maintaining yourself – your body, your mental state, your skills – is one way to build quality in. Please don’t code drunk.

 

“Building quality in” is the philosophy I think we need to have about software testing. If you’ve heard the term “shift left,” it’s the same idea (test earlier).

If we let testing be a phase that happens after coding, here are some of the problems we encounter:

  • Rework: Bugs found in testing result in test-and-fix cycles.
  • More defects missed or found after initial testing: Testing at the end can't keep up. As the code grows, running regression tests takes longer and longer, and there are always new features to be tested. Either the test cycle keeps growing or some tests get skipped, causing defect leakage.
  • Context switching and information loss: Often, by the time a bug feeds back to the devs, they are working on something else. They might have forgotten the details and are more likely to make mistakes. Merge issues more likely.
  • High work-in-process (WIP): As a result of the above two. Many things will be "dev complete" but stuck in the test-and-fix cycle -- never truly done.
  • Integration and stabilization sprints: If the devs keep writing untested code, at some point they have to stop writing code so we can stabilize things to rerun all the tests.
  • Overreliance on black-box testing, which is usually slower and harder to run in a continuous-integration (CI) environment.
  • Large batch size (large stories, large releases): When tests take too long and require manual effort, feedback cycles are too long and developers learn to cram as much stuff as possible into a change.
  • Long cycle time: Influenced by all of the above. This includes long time to recover from production issues.
  • Unpredictability: Large batch sizes, long cycle times, and long feedback loops combine to produce an opaque process with unpredictable timelines.
  • Missed deadlines: When you do the testing at the end and find a serious problem, it might be too late to fix (and retest) in time.
  • Partial testing: Rerunning all the tests becomes long and expensive, so we take shortcuts and say the code didn't change enough to have to rerun tests. And that's when it breaks.
  • Developer and QA -- separate roles, possibly even separate teams. "Us vs. them." Low trust. Low cohesion. Disparate goals.
    • While I believe there is value in having test specialists and in doing exploratory and usability testing, high performers in the 21st century (not just 2021 -- I mean like 2001) do not outsource quality. A development task is not complete until the developer can prove that it's working in an automated way, not "it works on my machine."
  • Unrefactorability (I just made that word up): When testing takes too long, developers will be too scared to change the code.
    • Future changes will be hacked and patched in instead of designed in.
  • Overwork and extra hours: If you have all of the above, what else would you expect? The only way anything is going to get done is by people working their tails off. Often there is a tendency to reward those heroes, which helps perpetuate the behavior.
  • No time to learn or collaborate: You think the business is going to allow you slack time to learn and experiment, or pair/mob program, when you've missed all your deadlines and are drowning in bugs because we're not building quality in?
  • More micromanaging; Similar to above, if you're missing all your dates because of testing and quality issues, be prepared for more control measures: multiple status updates per day, more estimates, more requirements that dictate implementation details.
  • All of the above can lead to low morale, low retention, and difficulties in recruitment.
  • Leakage of customer data, poor products, bad customer experience, loss of reputation, downsizing, bankruptcy: If all of the above were happening, doesn't this seem inevitable?

 

You may think I'm fearmongering. Keep in mind that most developers and teams nowadays are doing at least some test automation, in which case the above problems might not be as pronounced.

 

Image credit: https://www.slideshare.net/lazygolfer/testing-does-not-equal-quality

 

Here are some ideas on how to improve:

  • The team, especially leadership, needs to vigorously resist schedule pressure and overcommitment. Refuse to cut corners on quality, and do not offer it as a tradeoff. Speak up about the risks caused by technical debt and testing debt.
  • For testing to have true value, it should influence, and ideally be an indivisible part of, the development process. Revisit your "definition of done" with this in mind.
  • Developers need to write unit tests together with their code -- in the same commit. If you want tests that inform the design and lead to better, more testable code, consider writing unit tests before the code.
  • All unit tests should be run in an automated environment (CI) on every commit. All tests must pass (without cheating and rerunning failures) before code is accepted.
  • Measure and improve your code coverage.
    • Warning! While code coverage is a useful metric to indicate risk -- low coverage indicates lack of testing and high risk -- do not rely on it blindly. High coverage does NOT indicate good testing or low risk! You need to write tests that exercise your code with appropriate sets of inputs (including boundary conditions and error cases), and your tests need to perform appropriate assertions.
  • You can set a code-coverage gate to make your build fail if coverage drops below a specified percent. Start with the current level and increase it over time as you improve your testing.
    • To set up a coverage gate in Gradle you can do something like the following:

gradle.properties:

minimumLineCoverage=0.90

build.gradle:

jacocoTestCoverageVerification {

violationRules {

rule {

limit {

counter = 'LINE' // If not specified then default is branch. Options are: INSTRUCTION, LINE, BRANCH, COMPLEXITY, METHOD, CLASS

  minimum = minimumLineCoverage as BigDecimal

}

}

}

}

check.dependsOn jacocoTestCoverageVerification

  • Other types of tests, not just unit, can be planned, and ideally automated, at the beginning of development. Consider a "Three Amigos" approach to BDD and executable specifications.
  • Rome wasn't built in a day. For a code base with no automated unit tests, try starting the practice with new code. And as you revisit and change old code, start writing unit tests for the changed code. You may also find the code is untestable, in which case I recommend putting in place some higher-level integration tests so you at least get some coverage; and once those are in place you will have more confidence to refactor and make your code unit-testable.
  • Developers and testers can work together -- pair on test cases, pair on coding, and both will learn. Here's a fun video related to that: Sleeping with the enemy
  • Pairing or mobbing can serve as a live code review, and are ways to “build quality in” instead of inspecting code at the end with merge requests. It may not work in every situation, but at least do some collaboration early in development.

 

Image credit: https://www.methodsandtools.com/archive/archive.php?id=94

 

Thanks for reading!

Today I learned... Scala 3 is awesome.

My previous post was about a DIY implementation of union types in Scala.

I have been doing Kotlin recently (also awesome), and just got around to looking into Scala 3... And lo and behold, they added union types -- nice!

They also pulled null out of Any and use union types to model nullable types such as Any | null or String | null, etc.

This is similar to how Kotlin and C# model nullable types, although they use a question mark suffix (which is actually really clean and lends itself to cool things like the elvis operator).

And they cleaned up extension methods (now more like Kotlin and C#) so they don't go through the convoluted implicits (which have now been simplified).

These are breaking changes and require opt-in, and there are more changes, but definitely a worthwhile upgrade that I look forward to using.

Thursday, February 16, 2017

Recently I learned... Union Types

Union types can add powerful functionality in cases where it would be easy to give up and lose type safety.

Suppose you have a typed configuration key -- for example, myIds.
myIds stores a value of type Set[Int], but because it's a configuration key, sometimes you might set it in code with a Set[Int], but sometimes you might want to set it from a String value from a config file.

So let's say myIds has a setValue method. We would like to be able to call setValue with a Set[Int] or with a String, but nothing else, and it should be safe at compile-time.

e.g.,:
myIds.setValue(Set(1, 2, 3))
myIds.setValue("1,2,3")

We could give up type-safety:

private var internalValue: Set[Int]
def setValue(value: Object): Union = {
  if (value.isInstanceOf[String]) {
    internalValue = convertFromString(value)
  } else if (value.isInstanceOf[Set]) { // Set of what? Grr type erasure is so evil.
    internalValue = value
  } else {
    throw new UglyAndHorribleRuntimeException("Wish we could have prevented this at compile-time!")
  }
}

How can we make it compile-time safe instead of accepting Object?

Enter Union Types!
Scala has intersect types -- just using "with" and you get a type made by intersecting two types.
It is possible to do Union types as well, but it's not built into the language so it's a bit more work.

Ideally we want to do something like:

type SetOrString = Set[Int] ∨ String
def setValue(value: SetOrString): Unit
...

Thanks to some help from StackOverflow, this can be accomplished! It's not 100% clean, but pretty good.

// http://stackoverflow.com/questions/3508077/how-to-define-type-disjunction-union-types// http://www.edofic.com/posts/2013-01-27-union-types.html// http://milessabin.com/blog/2011/06/09/scala-union-types-curry-howard/
// We want to be able to define union types so we can have, for example, a method that// accepts either an Int or a String, and for this method to be type-safe at compile-time,// and for it to work on primitives rather than having to use a boxed type such as Either.// Scala allows us to define type intersection using the "with" operator.// Scala does not have a union operator.// It's possible to define union using DeMorgan's Law: X or Y = NOT(NOT(X) AND NOT(Y))// Another way to think about it - from StackOverflow.com:// Given type ¬[-A] which is contravariant on A, by definition given A <: B we can write ¬[B] <: ¬[A],// inverting the ordering of types.// Given types A, B, and X, we want to express X <: A || X <: B.// Applying contravariance, we get ¬[A] <: ¬[X] || ¬[B] <: ¬[X].// This can in turn be expressed as ¬[A] with ¬[B] <: ¬[X] in which one of A or B must be a supertype// of X or X itself (think about function arguments).
import scala.language.higherKinds
/**  * Negation  * @tparam A Type to negate  */sealed trait ¬[-A]

/**  * Type set  */sealed trait TSet {
  type Compound[A]
  type Map[F[_]] <: TSet
}

/**  * Null type set  */sealed trait extends TSet {
  type Compound[A] = A  type Map[F[_]] = ∅
}

/**  * Type union  * Note that this type is left-associative for the sake of concision.  * @tparam T  * @tparam H  */sealed trait ∨[T <: TSet, H] extends TSet {
  // Given a type of the form `∅ ∨ A ∨ B ∨ ...` and parameter `X`, we want to produce the type  // `¬[A] with ¬[B] with ... <:< ¬[X]`.  type Member[X] = T#Map[¬]#Compound[¬[H]] <:< ¬[X]

  // This could be generalized as a fold, but for concision we leave it as is.  type Compound[A] = T#Compound[H with A]

  type Map[F[_]] = T#Map[F] ∨ F[H]
}


type UnionType = ∅ ∨ T ∨ String
def setValue[SetOrString : UnionType#Member](value: SetOrString)
  if (value.isInstanceOf[String]) {
    internalValue = convertFromString(value)
  } else { // Set[Int]
    internalValue = value
  }
}

No more runtime exception!
setValue("1,2,3") // Compiles
setValue(Set(1,2,3)) // Compiles
setValue(1) // Does not compile!
In my production code, I have a generic type Key that has a generic type parameter, so in my case instead of Set[Int] I would have some generic type T that is different based on whether it's an IntKey or an IntSetKey or something more complicated. The generics all work with this, and as a bonus, StringKey works as well (so String union String still works).
Understanding the set theory behind this is not a prerequisite for using it, so try it out and have some clean, type-safe code!

Wednesday, March 16, 2016

Today I taught someone... Scala implicits

Implicit classes / implicit conversions:

Suppose you have a Spark RDD and you want to call rdd.groupByKey()… It will fail to compile because the RDD class does not have a groupByKey() method.
Add an import to org.apache.spark.rdd.PairRDDFunctions, and it will magically compile.
PairRDDFunctions is a class that defines extra functions you can use on an RDD. The class definition for PairRDDFuncitons is honestly rather confusing… But here is a simpler example that shows how implicit classes can add additional functionality to a class:

implicit class StringExtensions(val str: String) extends AnyVal {
  /**
    * Whether or not the string is comprised entirely of digits (doesn't accept negative numbers)
    * @return Whether or not string is numeric
    */
   def isNumeric: Boolean = {
    str.forall(Character.isDigit)
  }
}

After adding an import of StringExtensions, you can now write code like “12345”.isNumeric and it will compile even though that is not a method in java.lang.String. This is similar to C#’s extension methods.

Another handy use of this that you may come across is Java/Scala interoperability, to convert between the Java and Scala collection types.
Let’s say we’re in Java 7 (no streams, boo hoo) and want to do some functional programming on a java.util.List.
Maybe it’s a List<String> and we want to find all the numeric strings and do some crazy map, like for each string, get a tuple of the reversed string and the summation of the digits in the string (not sure why we would ever need to do this). Then we will return it back as a list of the numbers.
If we import scala.collection.JavaConverters._, then we can do this in one (long) line:
javaList.asScala.filter(_.isNumeric).map(str => (str.reverse, str.foldLeft(0)((accumulatedValue, char) => accumulatedValue + char.toString.toInt))).asJava
Here, asScala is an implicit conversion from a Java List to a Scala List, isNumeric is the implicit method we defined above, and reverse is defined in java.lang.String.
The map function turns the string into a 2-tuple of the reversed string and the summation of the digits in the string (done using foldLeft, but not the only way to do it).
The map returns a Scala sequence, so .asJava converts it back to a Java List.
If you import scala.collection.JavaConversions._ instead (-ions instead of -ers), then you can remove the .asScala and .asJava, and it will still work, but doing the conversions explicitly improves readability.

Implicit parameters 

Looking in the Spark documentation, it says you can define an accumulator of type double, given a Spark context, by doing context.accumulator(0.0).
But this does not work unless you add an import to org.apache.SparkContext._ -- this is because, although the SparkContext class does have an accumulator method, it takes in additional parameters, which are marked as implicit. The SparkContext object defines implicit objects of the types needed by the accumulator functions. So once you import those and get them in scope, you can call accumulator with a single parameter, and it will fill in the additional parameter(s) based on the implicit variables in scope.

object Main extends App {
  def sayHello(name: String)(implicit greeting: String) {
  println(s"$greeting, $name")
  }
  // sayHello("John") // does not compile because the greeting is not given
  sayHello("Paul")("Hello")
  implicit val greeting = "Hi"
  sayHello("George"// Works because there is an implicit of the same type in scope
}