jeudi, décembre 08, 2011

Simple linear regression in Scala

Here is how to compute Simple linear regression in Scala.
Class LinearRegression takes in n measurements from a List[(x: Double, y: Double)] and computes the line that best fits the data according to the least squares metric.
This Scala program is the scala translation of the java program available at .
class LinearRegression(val pairs: List[(Double,Double)]) { 
 val size = pairs.size
 println("pairs = " + pairs)

 // first pass: read in data, compute xbar and ybar
 val sums = pairs.foldLeft(new X_X2_Y(0D,0D,0D))(_ + new X_X2_Y(_))
 val bars = (sums.x / size, sums.y / size)

 // second pass: compute summary statistics
 val sumstats = pairs.foldLeft(new X2_Y2_XY(0D,0D,0D))(_ + new X2_Y2_XY(_, bars))

 val beta1 = sumstats.xy / sumstats.x2
 val beta0 = bars._2 - (beta1 * bars._1)
 val betas = (beta0, beta1)

 println("y = " + ("%4.3f" format beta1) + " * x + " + ("%4.3f" format beta0))

 // analyze results
 val correlation = pairs.foldLeft(new RSS_SSR(0D,0D))(_ +, bars, betas))
 val R2 = correlation.ssr / sumstats.y2
 val svar = correlation.rss / (size - 2)
 val svar1 = svar / sumstats.x2
 val svar0 = ( svar / size ) + ( bars._1 * bars._1 * svar1)
 val svar0bis = svar * sums.x2 / (size * sumstats.x2)
 println("R^2                 = " + R2)
 println("std error of beta_1 = " + Math.sqrt(svar1))
 println("std error of beta_0 = " + Math.sqrt(svar0))
 println("std error of beta_0 = " + Math.sqrt(svar0bis))
 println("SSTO = " + sumstats.y2)
 println("SSE  = " + correlation.rss)
 println("SSR  = " + correlation.ssr)

object RSS_SSR {
 def build(p: (Double,Double), bars: (Double,Double), betas: (Double,Double)): RSS_SSR = {
  val fit = (betas._2 * p._1) + betas._1
  val rss = (fit-p._2) * (fit-p._2)
  val ssr = (fit-bars._2) * (fit-bars._2)
  new RSS_SSR(rss, ssr)

class RSS_SSR(val rss: Double, val ssr: Double) {
 def +(p: RSS_SSR): RSS_SSR = new RSS_SSR(rss+p.rss, ssr+p.ssr)

class X_X2_Y(val x: Double, val x2: Double, val y: Double) {
 def this(p: (Double,Double)) = this(p._1, p._1*p._1, p._2)
 def +(p: X_X2_Y): X_X2_Y = new X_X2_Y(x+p.x,x2+p.x2,y+p.y)

class X2_Y2_XY(val x2: Double, val y2: Double, val xy: Double) {
 def this(p: (Double,Double), bars: (Double,Double)) = this((p._1-bars._1)*(p._1-bars._1), (p._2-bars._2)*(p._2-bars._2),(p._1-bars._1)*(p._2-bars._2))
 def +(p: X2_Y2_XY): X2_Y2_XY = new X2_Y2_XY(x2+p.x2,y2+p.y2,xy+p.xy)

mardi, novembre 29, 2011

Concrete Scala Map and SortedMap example

This is a concrete Scala Map and SortedMap example.
I wrote this post because I found it very difficult to find the right information when trying to get a concrete Map and SortedMap.
class StringOrder extends Ordering[String] {
 override def compare(s1: String, s2: String) =
class MyParameter() {}

class ZeParameters(val pairs:List[(String,MyParameter)] = Nil) extends SortedMap[String,MyParameter] {
 /**** Minimal Map stuff begin ****/
 lazy val keyLookup = Map() ++ pairs
 override def get(key: String): Option[MyParameter] = keyLookup.get(key)
 override def iterator: Iterator[(String, MyParameter)] = pairs.reverseIterator
 override def + [B1 >: MyParameter](kv: (String, B1)) = {
  val (key:String, value:MyParameter) = kv
  new ZeParameters((key,value) :: pairs)
 override def -(key: String): ZeParameters  = new ZeParameters(pairs.filterNot(_._1 == key))
 /**** Minimal map stuff end ****/
 /**** Minimal SortedMap stuff begin ****/
 def rangeImpl (from: Option[String], until: Option[String]): ZeParameters = {
  val out = pairs.filter((p: (String, MyParameter)) => {
   var compareFrom = 0
   from match {
    case Some(s) => compareFrom =
    case _ =>
   var compareUntil = 0
   until match {
    case Some(s) => compareUntil =
    case _ =>
   compareFrom>=0 && compareUntil<=0
  new ZeParameters(out)
 def ordering: Ordering[String] = new StringOrder
 /**** Minimal SortedMap stuff end ****/
Do not forget that you can also transform your map into a list and then use sortBy:
class ListSort {

jeudi, novembre 24, 2011

SW development / Agile SCRUM quotes

Ziv’s Law: Software Development is Inherently Unpredictable

Humphrey’s Law: Users Do Not Know What They Want Until They See Working Software

Conway’s Law: The Structure of the Organization Will Be Embedded in the Code

Wegner’s lemma: an interactive system can never be fully specified nor can it ever be fully tested

Langdon’s lemma: software evolves more rapidly as it approaches chaotic regions (taking care not to spill over into chaos)

jeudi, septembre 01, 2011

When programming in Java or Scala, I miss those C pre compiler macros __FILE__ , __LINE__ and __FUNC__ . I use them for logging where I am in my programs.

Well, I decided to have those in Scala, using Stack parsing after athrowing an interruption. I personally don't care if it's take time to execute.

There is one advantage compared to the C macros: you can get any upper level in the calling stack, which I sometimes find handy.

object util {
 val MatchFileLine = """.+\((.+)\..+:(\d+)\)""".r
 val MatchFunc = """(.+)\(.+""".r
 def main(args: Array[String]): Unit = { 
 def tag(i_level: Int): String = {
  val s_rien = ""
  try {
   throw new Exception()
  } catch {
   case unknown => unknown.getStackTrace.toList.apply(i_level).toString match {
    case MatchFileLine(file, line) => file+":"+line
    case _ => s_rien

 def func(i_level: Int): String = {
  val s_rien = "functionNotFound"
  try {
   throw new Exception()
  } catch {
   case unknown => unknown.getStackTrace.toList.apply(i_level).toString match {
    case MatchFunc(funcs) => funcs.split('.').toList.last
    case _ => s_rien

class util() { }

vendredi, août 19, 2011

After 8 years, 6 specifics reasons for getting as many contacts as possible on LinkedIn

8 years ago, I started to be very active on social business networking. I registered on LinkedIn, Viadeo and Xing.

I then sent a huge number of invitations, using some programming techniques, gathering and guessing emails. And I got a fairly high number of contacts...

I now use Viadeo for France (6000+ 1st level contacts), LinkedIn for the world (20000+ 1st level contacts) and Xing for german speaking countries  (1000+ 1st level contacts).
Read my 2 most popular previous posts about business social networking:

Here are 6 specifics reasons why I feel I made the right choice: getting as many contacts as possible:
  1. In 2007, Through a chain of “weak contacts(*)” on LinkedIn I was able to hire a new employee in my team.
  2. In 2012, I was instrumental in getting an ex-colleague from the Bay area get hired by a famous company in Cuppertino. (I live in France and work in Switzerland).
  3. Once, I received a mail from someone writing me: “I want to thank you for just having forwarded an introduction for me, and because of this simple action, I got a 200K$/year job”
  4. In 2011, someone from my company HR, asked me several times to advertize some job position on LinkedIn, knowing I had a lot of contacts and it would get greater visibility.
  5. I regularly forward business opportunities to business development people in my company.
  6. Getting so many contacts demonstrates some persuasion ability. I got connected to 50% people through carefully crafted invitations. The 50% other part of my connections invited me.

As you can see, I did not get any personal advantage in having so many contacts. Networking is something I consider primarily as a service to others: to get a job, hire someone or simply do business together.

Here are the reasons why you should network with people such as me who have huge networks:
  1. You will see more profiles.
  2. You will be seen by more people.
  3. I always answer positively to requests for help, even if the requester is a young student working in another part of the world.
  4. If a request comes through me, it will be forwarded quickly because:
    • I am accessing my networking sites daily.
    • I know how to use the networking tools.
    • You will probably need less networking hopping as I have access to more people.
  5. I don’t do spam, because I value my huge network too much to annoy people with mass mailing.

A “weak contact” is someone, You have never closely interacted with in the real life.

jeudi, mars 10, 2011

Scala TreeSet: foreach vs foldLeft

Here is the best article I found to understand what is foldLeft. And here is an example on how to replace foreach with foldLeft:

class KbdKey() {
  def generateQuartets(): ListSet[Quartet] = {...}

class KbdMatrix() {
  var ts_keys = new TreeSet[KbdKey]()

  var l_quartets = ListSet.empty[Quartet];
  ts_keys.foreach(l_quartets ++= _.generateQuartets())

  // 2 lines above can be replaced by the line below

  val l_quartets = ts_keys.tail.foldLeft(ts_keys.head.generateQuartets)(_ ++ _.generateQuartets)

mercredi, mars 09, 2011

Scala Treeset example: partition

var ts_keys = new TreeSet[KbdKey]()(new CompareRowThenCol())

val (ts_inferiorOrEqualRowKeys,ts_superiorRowKeys) = ts_Keys.partition(_.i_row<=i_row)

class CompareRowThenCol extends Ordering[KbdKey] {
    def compare(k1: KbdKey, k2: KbdKey) = k1.value-k2.value

lundi, mars 07, 2011

Scala Treeset example: filter, groupBy, foldLeft, sortBy, etc...

A scala example doing various operations on a TreeSet containing keys organized in rows and columns and who have a certain type. Output is an html table.

def myPrint(i_type: Int, ts_keys: TreeSet[KbdKey]) {

  val lcol = (0 to 16)

  print(lcol.tail.foldLeft("Column [" + lcol.head +"]")(_ + "Column [" + _ +"]"))

  val ts_matrixFilteredByType = ts_keys.filter((k: KbdKey) => k.i_type==i_type)

  val m_matrixGroupByRow = ts_matrixFilteredByType.groupBy((k: KbdKey) => k.i_row)

  val m_matrixGroupBySortedRow = ListMap(m_matrixGroupByRow.toList.sortBy{_._1}:_*)

  m_matrixGroupBySortedRow.foreach((p:(Int, TreeSet[KbdKey])) => 
    print(p._2.tail.foldLeft("Row["+p._1+"]" + p._2.head.myprint)(_ + _.myprint) +"-"))

Things to notice are:
  • use of foldLeft to do some printing and not some plain summing "as usual".
  • groupBy returns pairs.

Scala and Java libraries in Eclipse

Don't try to mix scala and java in the same scala project in eclipse. You will 1st have the feeling that it works, but as they are not compiled the same way, it's better to have java and scala files in separate projects. Scala project should have the java project as a dependency.

To get it working, you have to do a lot of project cleaning, to "synchronize" both projects.

A Scala file using a Java class:
   import kbdmatrix_java._
   class KbdMatrix(val L: MyLog) {

The Java file:
   package kbdmatrix_java;
   public class MyLog {

Do not forget to advertize the Java class and its methods as public.

Scala: code to find worksheet name in an excel workbook saved in microsoft xml 2003 format

Scala: code to find worksheet name in an excel workbook saved in microsoft xml 2003 format:
def matchName(n: scala.xml.NodeSeq, s_attributeValue: String): Boolean = {
  n match {
            case xml.Elem(_, "Worksheet", xml.PrefixedAttribute("ss", "Name", v, _), _, _*) => 
                 if(v.text==s_attributeValue) true else false
            case _ => false 

mercredi, février 23, 2011

Encryption is useless

What actually happened (Cnet 23Feb2011): Feds seek new ways to bypass encryption
SAN FRANCISCO--When agents at the Drug Enforcement Administration learned a suspect was using PGP to encrypt documents, they persuaded a judge to let them sneak into an office complex and install a keystroke logger that recorded the passphrase as it was typed in.

Read more:

samedi, janvier 15, 2011


Here is an interestzing way of displaying information I found at
Click on the image below to dynamically see data:

vendredi, janvier 07, 2011

Dimensions Treemap using Protovis

Based on Treemap from protovis design in javascript, I have now a treemap with 3 dimensions.

Click on the image below to see it in action (because my post does not allow it to be dynamically displayed).

3 Dimensions are:
  1. definition of groups that share the same kind of color
  2. each square with it's area
  3. each square is more or less dark and more or less saturated
This is something that you I use to display a status of issues on a bunch of projects. For instance we develop several keyboards and mice at the same time.
  1. The yellow group of color could group all keyboard projects.
  2. square area is proportional to the total number of bugs for this project
  3. square darkness and saturation/alpha is proportional to the number of open bugs for this project