Thursday 17 December 2015

Creating floating point arrays in Python

Note: all script output is included at the bottom of each code block and indicated with '>>>' (or sometimes the output is summarised in a code comment). Yes, it's confusing, but you're smart ;)
Let's imagine you want a simple array of consecutive floating point numbers in Python: [0.1, 0.2, 0.3 ... 0.9]. You start by trying to use Python's built-in range() function:
x = range(0.1,1,0.1) # Don't do this

Expecting to get an array of numbers from 0.1 (first argument, inclusive) to 1 (second argument, exclusive) with a step-size of 0.1 (third argument).
But the Python range function can only deal with integer step sizes, and complains:
>>> TypeError: range() integer step argument expected, got float.

OK, so you import numpy and use arange(), right?
import numpy as np
x = np.arange(0.1, 1, 0.1)
# x is [0.1, 0.2, ..., 0.9]

Exactly what you wanted. But what if you don't have numpy? What if you care about code footprint, portability, and all those things? What if you want someone else to be able to generate your super interesting array, and they don't have numpy? Do you ask them to install numpy, knowing their lives will be better in the long run? After internal debate, you delete the numpy dependency, and try a list comprehension instead:
x = [x * 0.1 for x in range(1,10)]

Hah, now you must surely have the best of all worlds. One line, no dependencies, and a list comprehension. This is great. This is amazing. You print it to double-check your handiwork:
print (x)
>>> [0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6000000000000001, 0.7000000000000001, 0.8, 0.9]

Wat?
Oh yes, computers suck at floating point numbers. Now what? Back to numpy? Round the numbers? Write a library to handle all of this? Re-write numpy? You try once more for a simple solution:
x = [x/10.0 for x in range(1,10)]
print(x)
>>> [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]

Interesting. Division works where multiplication doesn't. You try out a few variations of each to make sure the distinction is consistent, and find out it is. You're kind of happy with the division solution, but floating point division is slower than floating pointmultiplication, right? What if someone wants to do this a million times? You decide to see how much time you're losing for floating point precision:
import time

N = 1000000
t1 = time.time()

for j in range(N):
foo = [x * 0.1 for x in range(1, 10)]
print("multiplication: {}".format(time.time() - t1))

t1 = time.time()
for j in range(N):
foo = [x / 10.0 for x in range(1, 10)]
print("division: {}".format(time.time() - t1))

>>> multiplication: 4.11618614197
>>> division: 4.24211502075

That .1 second for every run of a million hurts a bit. Division is slow. Why not just multiply the numbers as in the first attempt, and then round them to 2 places? Last try:
# ...
t1 = time.time()
for j in range(N):
foo = [round(x * 0.1, 2) for x in range(1, 10)]
print("round: {}".format(time.time() - t1))

No more slow division! And how long can it take to shave off some decimal places?
Quite a while, it turns out. Is that nearly 30 seconds? Yes, it is.
>>> multiplication: 4.11618614197
>>> division: 4.24211502075
>>> round: 29.5979890823

OK, that's slow. Slower than sub-string manipulation as it turns out. You convert the broken float to a string, take the first three characters off it, and then turn it back to a float, just to prove a point:
# ...
t1 = time.time()
for j in range(N):
foo = [float(str(x * 0.1)[:3]) for x in range(1, 10)]
print("string: {}".format(time.time() - t1))

>>> multiplication: 4.10650587082
>>> division: 4.21635198593
>>> round: 30.1995661259
>>> string: 23.5980100632

Now that you've put that 0.1 second paid for division into context, you feel OK using it. For comparison (even though you said 'last attempt' a while back), you add timing for numpy as well:
# ...
t1 = time.time()
for j in range(N):
foo = np.arange(0.1, 1, 0.1)
print("numpy: {}".format(time.time() - t1))

>>> multiplication: 4.17134094238
>>> division: 4.23171901703
>>> round: 36.1641287804
>>> string: 23.5332589149
>>> numpy: 3.10889482498

A whole second faster! Maybe you should include that massive dependency after all. But you remind yourself what you've read a million times in start-up blogs. The most important thing is code readability. The compiler will do optimization better than you can ever hope to, right? Right?? Or - maybe not.

Saturday 12 December 2015

Spritz and applications that use it

When Spritz announced their new speed-reading technology with a flashy website, an impressive demo, and some complimentary media articles, people noticed. They claimed that they would change the future of reading, and people believed them. I believed them. The premise is simple - words flash one-by-one in front of you on a fixed point, saving you the time you usually spend moving your eyes backwards and forwards while reading text in lines.

It's difficult to re-imagine ideas as popular as reading. Books have chapters, pages, paragraphs, and lines. They have many of these things not because they are inherently important to reading, but because they were necessary to print words out on paper. Some web pages still try to incorporate the idea of pagination - you get half way through a news article and then have to press "Go to page 2". Most people agree that pagination is generally A Bad Idea and no longer use it.

Two years since the Spritz announcement, and the technology built on Spritz is disappointing. There are some half-baked attempts to create speed-reader applications for most popular platforms, but nothing revolutionary. Most of these applications are just a wrapper of the Spritz demo that allow users to upload the content they want to read or find it online.

The Spritz applications that I find useful, but far from complete, are:
• The Spritz 'bookmarklet' (web) allows you to Spritz most text that you come across online, simply by selecting it and pressing the bookmark in your (desktop) browser
• Pros: Free, easy to install and use, fairly versatile
• Cons: Doesn't work on mobile devices, Isn't designed to read books or other files.
• SpeedRead (web) - a website that allows you to upload your own files (including PDFs), and read them with a very attractive modification of the standard Spritz interface.
• Pros: Looks good, works well, allows user files
• Cons: Not free, although the 'try it out' functionality has no time or number of use restrictions.