Note: all script output is included
at the bottom of each code block and indicated with '>>>' (or
sometimes the output is summarised in a code comment). Yes, it's confusing, but
you're smart ;)
Let's imagine you want a simple array
of consecutive floating point numbers in Python: [0.1, 0.2, 0.3 ... 0.9]. You
start by trying to use Python's built-in range() function:
x = range(0.1,1,0.1) # Don't do this
Expecting to get an array of numbers
from 0.1 (first argument, inclusive) to 1 (second argument, exclusive) with a
step-size of 0.1 (third argument).
But the Python range function can
only deal with integer step sizes, and complains:
>>> TypeError: range()
integer step argument expected, got float.
OK, so you import numpy and use arange(), right?
import numpy as np
x = np.arange(0.1, 1, 0.1)
# x is [0.1, 0.2, ..., 0.9]
Exactly what you wanted. But what if
you don't have numpy? What if you care about code footprint, portability, and
all those things? What if you want someone else to be able to generate your
super interesting array, and they don't have numpy? Do you ask them to install
numpy, knowing their lives will be better in the long run? After internal
debate, you delete the numpy dependency, and try a list comprehension instead:
x = [x * 0.1 for x in range(1,10)]
Hah, now you must surely have the
best of all worlds. One line, no dependencies, and a list comprehension. This
is great. This is amazing. You print it to double-check your handiwork:
print (x)
>>> [0.1, 0.2,
0.30000000000000004, 0.4, 0.5, 0.6000000000000001, 0.7000000000000001, 0.8,
0.9]
Wat?
Oh yes, computers suck at floating
point numbers. Now what? Back to numpy? Round the numbers? Write a library to
handle all of this? Re-write numpy? You try once more for a simple solution:
x = [x/10.0 for x in range(1,10)]
print(x)
>>> [0.1, 0.2, 0.3, 0.4,
0.5, 0.6, 0.7, 0.8, 0.9]
Interesting. Division works where
multiplication doesn't. You try out a few variations of each to make sure the
distinction is consistent, and find out it is. You're kind of happy with the
division solution, but floating point division is slower than floating pointmultiplication, right? What if someone wants to do this a million times? You
decide to see how much time you're losing for floating point precision:
import time
N = 1000000
t1 = time.time()
for j in range(N):
foo = [x * 0.1 for x in range(1, 10)]
print("multiplication:
{}".format(time.time() - t1))
t1 = time.time()
for j in range(N):
foo = [x / 10.0 for x in range(1, 10)]
print("division:
{}".format(time.time() - t1))
>>> multiplication:
4.11618614197
>>> division: 4.24211502075
That .1 second for every run of a
million hurts a bit. Division is slow. Why not just multiply the numbers as in
the first attempt, and then round them to 2 places? Last try:
# ...
t1 = time.time()
for j in range(N):
foo = [round(x * 0.1, 2) for x in range(1, 10)]
print("round:
{}".format(time.time() - t1))
No more slow division! And how long
can it take to shave off some decimal places?
Quite a while, it turns out. Is that
nearly 30 seconds? Yes, it is.
>>> multiplication:
4.11618614197
>>> division: 4.24211502075
>>> round: 29.5979890823
OK, that's slow. Slower than
sub-string manipulation as it turns out. You convert the broken float to a
string, take the first three characters off it, and then turn it back to a
float, just to prove a point:
# ...
t1 = time.time()
for j in range(N):
foo = [float(str(x * 0.1)[:3]) for x in range(1, 10)]
print("string:
{}".format(time.time() - t1))
>>> multiplication:
4.10650587082
>>> division: 4.21635198593
>>> round: 30.1995661259
>>> string: 23.5980100632
Now that you've put that 0.1 second
paid for division into context, you feel OK using it. For comparison (even
though you said 'last attempt' a while back), you add timing for numpy as well:
# ...
t1 = time.time()
for j in range(N):
foo = np.arange(0.1, 1, 0.1)
print("numpy:
{}".format(time.time() - t1))
>>> multiplication:
4.17134094238
>>> division: 4.23171901703
>>> round: 36.1641287804
>>> string: 23.5332589149
>>> numpy: 3.10889482498
A whole second faster! Maybe you
should include that massive dependency after all. But you remind yourself what
you've read a million times in start-up blogs. The most important thing is code
readability. The compiler will do optimization better than you can ever hope
to, right? Right?? Or - maybe not.
No comments:
Post a Comment