Benchmark of symbols and strings in ruby

Friday, the end of the working day, no sings of trouble. "You've got mail" - ruby weekly. The latest much-talked of posts in blogs, jobs (skipping) ... wait, software engineer, London? The description of a vacancy and a firm ... a problem, that's interesting. "A local variable named log contains an array of hashes with timestamped events like ..." - piece of cake!

First solution was quick, but it works with 8 elements. What if there are 10000008 elements? 32 seconds of calculation is too long. Simple optimisation 'merge' to 'merge!' is done, which leads to 22 seconds. No way!

All the way home I was thinking over the problem, trying hard to solve it but everything was in vein. The answer came unexpectedly as usual - #haskell@bynets. There I was reminded about 'group_by' method. Of course, how could I forget about it? I won't give the solution in this post, not to spoil (it will be in my application form).

All benchmarking was carried out on my home iMac (21.5-inch, Late 2012) 2.7 i5, 8 Gb.

The solution (ruby 2.1.5)

symbols merge:

	user     system      total        real
reduce symbols:  18.100000   1.420000  19.520000 ( 19.857926)

Not bad, change 'merge' to 'merge!'

symbols merge!:

	user     system      total        real
reduce symbols:   5.770000   0.060000   5.830000 (  5.821176)

For these tests I've used hash keys as symbols which, as it is well known, aren't swept by GC during the program execution, therefore memory will not be cleaned.

So, why not change hash keys to strings, for memory cleaning?

strings merge:

   user     system      total        real
reduce strings : 20.730000   0.610000  21.340000 ( 21.349411)

strings merge!:

   user     system      total        real
reduce strings :  8.940000   0.210000   9.150000 (  9.145551)

As I expected, calculation took more time.

New GC was introduced in ruby 1.9, then 2.1 and at last in 2.2 (which is still in beta). Let's try ruby 2.2 and see what results we'll get.

symbols merge:

   user     system      total        real
reduce symbols:  20.100000   0.170000  20.270000 ( 20.273468)

symbols merge!:

   user     system      total        real
reduce strings :  6.620000   0.050000   6.670000 (  6.668298)

strings merge:

   user     system      total        real
reduce strings : 22.200000   0.210000  22.410000 ( 22.412580)

strings merge!:

   user     system      total        real
reduce strings :  6.610000   0.050000   6.660000 (  6.664764)

Since new GC in ruby 2.2 sweeps symbols, 2.2 is a bit slower than ruby 2.1.5.

Conclusion

When we work with big data, we have to be careful with tools, a single method can make a great change.

p.s There is an excellent video about ruby code optimisation.