Sunday, 7 December 2014

Building a Twitter Bot with Python

South Africa has been facing some horrible load-shedding recently, with different areas experiencing complete blackouts for 2 hour periods up to 3 times a day. Our electricity 'provider', Eskom, publishes schedules about which areas will be without power and when, but these schedules depend on what stage of load-shedding they've decided to hit us with, with stages running from 0 (no load-shedding) to 3 (severe load-shedding). Unfortunately the stages change at short notice and the schedules are completely different depending on which stage we're in. I wanted push notifications whenever the stage changed, so I decided to use Python and Twitter to implement a quick solution.

I thought it would take a few minutes, but the Twitter API has become more complicated since last time I used it and it ended up taking me a couple of hours from idea to final(?) implementation. The bot can be found here: https://twitter.com/eskomstagealert.

Because it only tweets when the stage changes, I enabled tweet notifications for it from my main Twitter account and now in theory I should get a notification for every change. I say in theory because although it passed all my tests with flying colours, there hasn't actually been a stage change since I turned him on.

The process I used to write a Twitter bot is as follows. I assume you're using Ubuntu and Python 2.7, but you should be able to adapt everything fairly trivially if you're not.

1. Create and set up a new Twitter Account
This may sound pretty self-explanatory, but Twitter requires some admin to be used without a human's touch. Twitter no longer allows automatic login through an API using just your username and password. Instead, we'll need to create a "Twitter app" and generate some IDs, tokens, and keys.

First, verify the account by clicking on the confirmation link that Twitter emails to you when you first sign up. Now, go to https://apps.twitter.com/ and click the "Create New App" button. Fill out the name and description and any valid-looking URL for the website (we won't be using it, but it's a required field). 'Read' the Ts & Cs and click "Create your Twitter Application"

By default, the app only has read permissions. We'll be wanting to "write" (i.e., tweet), so we need to modify these. Unfortunately, Twitter demands your phone number to obtain Read & Write permissions, so you'll have to go to https://twitter.com/settings/add_phone and set that up.

Once you're done giving away your freedoms, go back to the Twitter apps page (https://apps.twitter.com/), and click on your app. Now click on the permissions tab, select the "read and write" radio button, and click "Update Settings".

Click on the "Keys and Access Tokens" tab and note where your 'Consumer Key (API Key)' and 'Consumer Secret (API Secret)' are. We'll come back to grab them in a bit. For now, just scroll down the page and click on the "Create my Access Token" button near the bottom. Above the button, you should now see some more info, including and 'Access Token' and an 'Access Token Secret'.

That's all the Twitter setup we need; let's get onto the Python.

2. Pythoning the Twitter Bot
There are a bunch of Python wrappers for the Twitter API. They all do pretty much the same thing, but I chose Twython, as it seemed well established, maintained and documented (and it had the cleverest name). To install simply run pip install twython. If you don't have pip, shame on you. Go install it now. If you're ever going to write another line of Python code in your life, it'll be time very well spent.

The Python is very straight forward. Obviously the Exception handling has room for improvement, but hey this was never meant to be an industry-grade project. Finding the STATUS_URL, and then working out how the numbers related to stages, took a bit of trawling through Eskom's badly-written JavaScript and some use of BurpSuite. (Eskom's main loadshedding page loads the "No Loadshedding" message by default and then updates the value using JavaScript which means that simply parsing the main page was a bit optimistic for my needs.) As we're only storing a single number at any given time, a database seemed overkill, so I use a text file instead. Just make sure that you have write permissions for FILE_PATH from the user that you're going to be running the script as. (If you're not sure, use something like '/home/$USER/last_status.txt'). Create the last_status.txt file manually in the appropriate place with an initial value of 5, as there's no allowance for the file not existing or being blank in the code below (and the 5 means that it'll definitely be different from the current stage, so you'll get to see the first automatic tweet straight away).

import urllib2
from twython import Twython

STATUS_URL = "http://loadshedding.eskom.co.za/LoadShedding/getstatus"
NUMBER_MESSAGE_MAP = {"1": "No Loadshedding :)",
                         "2": "Stage 1 loadshedding active",
                         "3": "Stage 2 loadshedding active",
                         "4": "Stage 3 loadshedding active :("}

FILE_PATH = "/data/eskom/last_status.txt"

def send_tweet(tweet_text):
    APP_KEY = "your_app_key -
Consumer Key (API Key)"
    APP_SECRET = "your_app_secret -
Consumer Secret (API Secret)"
    OAUTH_TOKEN = "your_access_token"
    OAUTH_TOKEN_SECRET = "your_access_token_secret"
    twitter = Twython(APP_KEY, APP_SECRET, OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
    twitter.update_status(status=tweet_text)

def get_html(url):
    response = urllib2.urlopen(url)
    page = response.read()
    return page

def get_previous_key():
    with open(FILE_PATH) as f:
        s = f.read()
    return s[0]

def update_stage_key(new_key):
    with open(FILE_PATH, "w") as f:
        f.write(new_key)

def main():
    try:
        previous_stage_key = get_previous_key()
        current_stage_key = get_html(STATUS_URL)
        current_message = NUMBER_MESSAGE_MAP[current_stage_key]
        if not current_stage_key == previous_stage_key:
            print "updating",current_message
            update_stage_key(current_stage_key)
            send_tweet(current_message)
    except Exception as e:
        print e

main()


3. Set up a Cron job
The script simply checks the current stage against the one it saw most recently. If they don't match, it saves the new one as the most recent one, and tweets a message about it. Therefore we need to run the script regularly in order to receive accurate updates. EskomStageAlert is quite diligent and eager to please, so he checks every minute for a change. To set your bot up for the same interval, do the following.

Run the command "crontab -e" to edit your cron file. Press "2" to edit the file in Nano if prompted (or if you know what you're doing select one of the other options), and append the following to the bottom of the file

* * * * * python /home/$USER/eskom.py >> /home/$USER/eskom_log.txt

The asterisks in the beginning indicate that the task should be run every minute. Then comes our main command which is python followed by the full path to our script (modify yours as necessary if you saved your script somewhere else). The >> is optional, but it'll direct the output of our script to a file so we can check if anything goes wrong. Note that a single > will overwrite the log file every time, while using two >> will append to the file).

And that's that. If all went well, your Twitter bot should be in action. Note that the API doesn't allow you to tweet a status identical to your previous one, so if you get a 403 Error, duplicate tweets could be the cause. If you get other errors, double check that you copied all of the API keys correctly (with no extra spaces or bits missing), and if that doesn't fix the issue then try regenerating them through the Twitter apps page.

To finish off, simply visit your Twitter bot from your main Twitter account, using the Twitter mobile app. Press the "Follow" button and then the grey star next to it (on the Android app in any case. This might differ for other mobile apps). Now you'll get push Twitter notifications whenever your Bot tweets!


* One of the few gripes I have with Ubuntu is their choice to use Nano as a recommended default for cron instead of vim, which is better, but let's not get into text editor flame wars here.

Sunday, 16 November 2014

Microsoft and Google reversed

Microsoft has got quite a bad name for itself in many tech-circles. It's seen as a big profit-focused corporation, and instead of the amusing April Fool's jokes that Google treats us with, it gave us things like Clippy and the Blue Screen of Death (BSoD). Microsoft is also no stranger to Vendor Lock-In - people hate some of its products, but use them anyway for compatibility.

Google has always gone to great lengths to show that it is not like this. "Don't be evil" is it's first and foremost motto, and its employees wear jeans instead of suits. Occasionally it messes up. For example, Google recently sent thousands of cars driving around the world to collect imagery for Google Maps. A "bug" in the image collecting code meant that the cars also collected and stored all open WiFi data. But that was just a mistake, and everyone except Germany laughed about it.

But now? Things seem to be changing. Or completely reversing, possibly. Microsoft announced plans to open-source the .Net Framework, released free Android and OSx Office Apps, and told us about plans to support Android development in its next version of Visual Studio, a version which will have a free "community" edition. And Google released Google Inbox, yet another take on Email, which only works with Chrome and only if you have a phone running Android (yes, you actually need the Android phone to "activate" Inbox, even to use the Desktop version, for which you need Google Chrome.)

Google also showed off the power it holds over billions of people with Google Plus. Everyone hated it. It was a rehashing of Facebook, and no-one wanted to move. Now, everyone has a Google Plus account. Why? Because Google became the Jehovah's Witnesses of the Internet. There was barely a safe-place to visit without highly irritating alerts telling to you sign up for G+. When even that didn't work, Google automatically signed the survivors of the resistance up.

And yet Google is not even mentioned on the Vendor Lock-In page linked to above.


Unfortunately it's already too late. Google's products work too well with each other for complete abandonment. Google searches are faster in Chrome. Google Now tells me when to leave my house in order to catch my plane in time automatically. It finds my ticket in Gmail; uses my map data to work out where I live; uses others' map data to work out if there's traffic; it knows my preferred means of transportation, and it knows if the plane is delayed. What would take me hours to find out manually is told to me when I need it without any effort. And then there's good old search for which Google originally became famous. I can just mash the keyboard almost at random and I'll get the results I'm looking for, whereas other search engines require me to carefully think about the search terms or they return completely uninteresting results. I tried "going Google Free" a while back, and switched to Firefox and DuckDuckGo. It was a disaster.

In conclusion, I hate Google and their new tactics. But it's too late. "The emperor has already won".

Saturday, 11 October 2014

P2P throttling and Transmission

Peer to peer technology such as Bittorrent has gotten itself a bit of a bad name, because it is often used to obtain illegal copies of movies, and to otherwise infringe on the intellectual property rights of others. However it is a great way to share large files among a lot of people, and it has plenty of legitimate uses, such as downloading free operating systems and breaking through censorship.
In countries like South Africa, where I live, many ISPs throttle all P2P traffic, especially for uncapped accounts. This means that even if you're paying a small fortune for what first-world countries wouldn't consider an Internet connetion (a 1, 2, or 4 MB line), you may get fairly decent download speeds in most cases, but have all P2P traffic throttled to such an extent that it becomes unusable (many people report speeds as low as 5 KB/s for even well-seeded torrents). One way around this is to download files via P2P to another country, and then direct download from there. Even though you're doing twice as much work, it could well be faster, for unthrottled P2P can be lightning fast, as can a direct download from a dedicated server.
In this post, I look at how to set up a system that allows you to easily do this. We'll be using
  • A Digital Ocean virtual private server (VPS) running Ubuntu-Server 14.10
  • Transmission bit-torrent client (via web interface)
  • vsftpd (very secure FTP daemon)

Transmission

The first step is to install Transmission. This is a cross-platform bit-torrent client, which has an easy to use web interface. These steps are adapted from Daniel Morgan's blog post here.
I'll assume that creating and connecting to the VPS isn't a problem, but see the tutorial here if you need help with that.
Once you're connected to your VPS, run an apt-get update:
sudo apt-get update
Then install Transmission
sudo apt-get install transmission-daemon transmission-common
Now we'll set a username and password to access the web-interface for Transmission. First, we have to stop the transmission daemon, which started automatically when we installed it.
sudo service transmission-daemon stop
Now, edit the config file:
sudo nano /etc/transmission-daemon/settings.json
First, edit the "rpc-bind-address" line, to match the IP address of your VPS. To access the interface from another machine, edit the "rpc-whitelist-enabled" line to false.
Then look for the line that says "rpc-password", and edit the value to a password of your choice. It should look something like the following:
Before your edit:
....
"rpc-password": "{c073045d97f41b82f258e1e204d387f8299a7b22.h73jbg6"
....
After your edit:
....
"rpc-password": "alwaysUseAStrongPassword++^"
....
This password is automatically encrypted, so next time you visit hte file, it'll look similar to the original again.
Optionally, also edit the "rpc-username" field so you can log in with a different username, and if you're worried about using excessive bandwidth on your VPS, set the "ratio-limit-enabled" to true. The default value of 2 ("ratio-limit") means that you'll never upload more than twice the amount of data that you download for any given torrent.
Hit Ctrl + X and then Y to exit and save changes. Once you've done this, restart the transmission daemon with:
sudo service transmission-daemon start
To test that it's working, visit http://123.456.789.123:9091 (where the first part is your droplet's IP address) in your browser. You should see a username/password authentication box. Enter "transmission" and the password you set.
You should now see a web interface. In the top right-hand corner there's an "upload" button, which allows you to upload a torrent file to download. Magnet links also work fine - just paste them into the "or enter a URL" field, and press "upload".
This allows you to download files via P2P onto your VPS. But how do we get them off again. You can use SFTP or WinSCP, etc, where you can log onto your droplet directly via SSH and download files. If you want to download files through your browser too, it's easy to set up an FTP server. Keep in mind that FTP is fairly insecure though.

vsftpd

Installation is as simple as ever:
sudo apt-get install vsftpd
We just need to edit the config file to automatically make available the files that we download through transmission. The default conf file is quite long and full of options, but we'll just move it out the way and use some minimal options.
sudo mv /etc/vsftpd.conf /etc/vsftpd.conf.backup
sudo nano /etc/vsftpd.conf
and paste the following:
listen=YES
local_enable=YES
anonymous_enable=NO
write_enable=NO
local_root=/var/lib/transmission-daemon/downloads/
Then run
sudo service vsftpd
To reload with the new configuration.
Now anything you download via Transmission will be available at: ftp://123.456.789.123/ (again, substitute your IP). When you visit the FTP page, you will be asked for a username and password. You can use any user account on your VPS. If you've only got a root account (a terrible idea, but default on the DO droplets), create a new account:
sudo adduser johnsmith
Enter a new password (twice), and leave the other fields blank when prompted (just press Enter). Now you can use this username and password to access your files.

Saturday, 26 July 2014

Pretty Python Progress


Often when I write loops in Python I want to how much progress has been made. A simple way is to put a counter and a print statement:
import time
progress = 0
thingsToProcess = range(543)
for thingToProcess in thingsToProcess:
    progress += 1
    print "%s/%s" % (progress, len(thingsToProcess))
    time.sleep(0.05)
Or in fewer lines:
import time
thingsToProcess = range(543)
for progress, thingToProcess in enumerate(thingsToProcess):
    print "%s/%s" % (progress, len(thingsToProcess))
    time.sleep(0.05)
Which is OK, but it spams the output screen with a line for each iteration. We can improve a little bit by only printing on every Nth iteration:
import time
thingsToProcess = range(543)
for progress, thingToProcess in enumerate(thingsToProcess):
    if progress % 10 == 0:
        print "%s/%s" % (progress, len(thingsToProcess))
    time.sleep(0.05)
But we can do even better with not too much effort. Let's build progress bars and percentage counters! We use ANSI codes to print nice(?) colours in the terminal, and the "\r" special character (carriage return) to print over the same line in the terminal. We also need to "flush" the standard output on each print, or it will get buffered automatically and we'll only see the final line. Note also the comma at the end of the print statement in print_progress - this suppresses the newline character. To use, just call the print_progress function from inside any loop where:
  • You know the index of the current iteration
  • You know how many iterations there will be in total
  • There are no other print statements in the loop
Just change the string argument "colour" ("cyan") to any of the colours in the dictionary defined in print_progress to see the progress in other nice(?) colours. By subtracting 60 from each number in the colour dictionary a different shade of that colour will be printed instead (e.g. try using "32" instead of "92" for Green).
import sys
import time

def print_progress(current, total, colour=""):
    current += 1 # be optimistic so we finish on 100 
    colours = {"":0, "black":90, "red":91, "green":92, "yellow":93, "blue":94, "purple":95, "cyan":96, "white":97}
    COLOUR_START = '\033[%sm' % (colours.get(colour))
    COLOUR_END = '\033[0m'
    percent_float = float(current)/float(total) * 100
    percent = "%.1f" % percent_float
    bar = "|%s>%s|" % ("-" * int(percent_float/4), " " * (25 - int(percent_float/4))) 
    print "\r%s%s / %s - %s%% %s %s" % (COLOUR_START, current, total, percent, bar, COLOUR_END),
    sys.stdout.flush()

thingsToProcess = range(145)
for progress,value in enumerate(thingsToProcess):
    print_progress(progress, len(thingsToProcess), "cyan")
    time.sleep(0.05)

Edit: I updated code, which adds time remaining and uses a Class. Less hacky, more efficient, better. See demo function for example usage. Full listing below.

# Gareth Dwyer, 2014
# A simple progress bar for Python for loops, featuring
#   * Percentage counter
#   * ASCII bar
#   * Time remaining
#   * Customizable additional info (display last data processed)
#   * Customizable colours
#   

import sys
import time
from datetime import datetime, timedelta

def convert_seconds(num_seconds):
    """ convert seconds to days, hours, minutes, and seconds, as appropriate"""
    sec = timedelta(seconds=num_seconds)
    d = datetime(1,1,1) + sec
    return ("%dd %dh %dm %ds" % (d.day-1, d.hour, d.minute, d.second))

def run_demo():
    """ create a ProgressBar and run """
    pb = ProgressBar("cyan")
    data = range(30,56)
    for i,v in enumerate(data):
        time.sleep(0.5)
        pb.print_progress(i, len(data), v)

class ProgressBar:

    def __init__(self, colour="green"):
        """ Create a progress bar and initalise start time of task """
        self.start_time = time.time()
        self.colours = {"":0, "black":90, "red":91, "green":92, "yellow":93, "blue":94, "purple":95, "cyan":96, "white":97}
        self.start_colour = "\033[%sm" % (self.colours.get(colour))
        self.end_colour = "\033[0m"

    def print_progress(self, current, total, additional_info=""):
        """ Call inside for loop, passing current index and total length of iterable """
        if additional_info:
            additional_info = "[%s]" % additional_info
        current += 1 # be optimistic so we finish on 100 
        percent = float(current)/float(total) * 100
        remaining_time = convert_seconds((100 - percent) * (time.time() - self.start_time)/percent)
        percent_string = "%.1f" % percent
        bar = "|%s>%s|" % ("-" * int(percent/4), " " * (25 - int(percent/4))) 
        print "\r%s%s / %s - %s%% %s %s remaining: %s %s" % (self.start_colour, current, total, percent_string, bar, additional_info, remaining_time, self.end_colour),
        sys.stdout.flush()

if __name__ == '__main__':
    run_demo()

Saturday, 19 July 2014

How (not) to impress potential employees

Maybe you're involved in a company that is interested in employing computer science students. Maybe one day you will be. I've just come back from a week-long 'field trip' which had little to do with fields, but involved various companies in Cape Town doing their utmost to impress me and my classmates and to persuade us to apply to work for them. Some rose spectacularly to the occasion, while others ... didn't. I couldn't help but notice some very clearly defined differences between the two extremes, so here's a (relatively) brief how-to on impressing students.

The companies we visited were BSG, Amazon, Korbitec, Centre for High Performance Computing, Bandwidth Barn, Open Box, and KPMG, and each had us in their offices for about half a day. While I won't go in to any too much "naming and shaming", there are some definite yeses and nos about how to host a group of students / potential employees.

No slides
You can't possibly explain what you're all about without PowerPoint, right? Wrong. At two of the companies, we didn't see a single presentation, and it's no coincidence that these are the two companies where the class wasn't bored, playing on laptops, or sending witty messages (largely highly insulting to the company) on the class WhatsApp group. While anyone can give a PowerPoint presentation, it takes someone with some public speaking skills to talk to people informatively without hiding behind (in front of) slides. But even if you only have one person who can do this effectively, make sure they are available when the students arrive.

Definitely no software/tutorial slides
We're comp sci students. We've seen a lot of slides about how to use software and how computers or computing concepts work. This is our vacation, and we're here to hear about your company - not about Scrum methodology or frameworks that were the latest and greatest 5 years ago and which your company still thinks are worth talking about. One company had someone present a slideshow about AngularJS - a slideshow that the presenter happily admitted to having scrounged from the top Google result because he didn't have time to throw anything together himself. He used phrases such as "I'm not quite sure what this variable does, it was declared further up, ummm, I think". 

BSG tried to give us a miniature "workshop", which involved over four hours of slides and incompetent speakers. We were highly amused to discover an article on their website afterwards, claiming that the event had been a complete success. They went so far as to say that: "When asked whether they would like to work at BSG when they graduated, the students at the event were unanimous". And we were even more amused to see that they'd put a photo of the Information Systems class instead of a photo of us.

Don't appear stingy
We're all living off student budgets at the moment. When we open a menu at a restaurant, we automatically scan the price column for potential meals, and then look to the description for confirmation. Money still has a sense of intrigue. One company provided a couple of bottles of champagne, enough pizza to carpet my diggs, a wide variety of drinks which we failed to finish, and yo-yos. We were impressed and the yo-yos were played with during boring visits to other companies. Another company asked for the name-tags they'd given us back again, as they wanted to reuse the safety pins. We started joking in horrible ways about them before we were out of earshot. 

Allow us to interact with your employees
Anyone can make a company seem glamorous for a couple of hours. If we only see one room and two people, we are immediately suspicious about working conditions for the rest of the staff. Once we've heard the (short) introduction, give us food and invite your other staff too. We want to be able to engage in one-on-one casual conversations with people who are working there to get a less biased impression of your company. 

Tell us how much money we stand to make
Money is important to us. Every company without exception offered "a competitive starting salary". This exact phrase was pronounced by presenters, provided on pamphlets, and printed on posters. When asked for a ballpark figure, there were hushed silences, nervous giggles, and a reply of "we can't really talk about that". Rumours get around - we know, or think we know, what you're paying. We may well be wrong. But none of us are going to send in our CVs if the rumours are that you've given up on South African Rands altogether and are instead counting out ground-grown legumes for your employees at the end of each month. We know that starting salaries may differ even within your company - but give us some idea. You ask for our exam marks - imagine if we put on our CVs "above average exam marks, with competitive additional achievements".

Spend five minutes finding out who we are
We're constantly reminded that the worst thing we can do in an interview or cover letter is confuse your company with your competitor. If I walk into a Korbitec interview and tell them how excited I am about working for BSG, it's over. But we had presenters assume we were from UCT, be under the assumption that we were all Information System students, and even had one that gave us a whole lecture on Astronomy and Physics. On the other end of the spectrum, one company had asked our department to send in summaries of our honours projects, and the CEO spoke to some of us individually about what we were working on (while we ate pizza and drank beer). 

Give us nice toys
We've got a lot of pens, and company-branded lip-ice is not really our thing. We're not going to walk around with our keys on your company's lanyards. But again, we're students. You don't need to go all-out and buy us all new laptops, cars, and houses. Hoodies are great; one company gave us high-quality touch-screen styluses; even the sunglasses and yo-yos were used and not just chucked in the nearest bin (sometimes even bins in your offices, though mostly we were polite enough to use the ones outside). 

And finally, avoid clich├ęs "like the plague". Every company "does things a little bit differently", they all "encourage growth in their employees", "think outside the box", and "have a strong employee focus". We know you like to think that you "empower us" to "reach our full potential"; that you are kind of into "viable solutions", "understanding culture", and "leveraging opportunities". We don't care about "sector specialists", "open mindsets" that are "essentially very powerful", and we don't believe that you are "all about people". Your "company vision" is meaningless, and I doubt you could give an acceptable definition of "empathetic" if actually asked about it. Be straightforward with us, we're all intelligent enough to smell bullshit when it's shoved under our noses.

SSH Tunnelling for web access

Today I set up an SSH tunnel for the first time, and I was surprised at how easy it was! Using nothing but a simple SSH command and Firefox, you can route all your web traffic over an SSH connection, ensuring that it is all encrypted, and bypassing petty firewall rules. Completely hypothetically, this could also be used to gain access to a WiFi connection which allows SSH connections but redirects all HTTP requests to a "please sign up with your credit card details to access our slow WiFi at extortionate rates" page (as is the case with many public WiFi hotspots). I hope I need not reassure my readers that this is definitely not, in any way, why I needed an SSH tunnel.

What you need:
  • A computer located anywhere in the world with unfettered access to the internet, a static IP, and which is capable of accepting SSH connections. *
  • A computer which has restricted access to the internet.
  • Mozilla Firefox.
  • PuTTY if the machine from 2) is running Windows.
If the restricted machine is running Linux, simply open a terminal and enter the command:

ssh -D 8080 user@123.456.789.123

where 'user' and the IP address are those for the unrestricted computer. This sets up a dynamic SSH connection which tunnels all traffic sent to port 8080 via SSH to the unrestricted machine. No set up on the remote machine is needed at all!

Now open Firefox and go to Options (or Preferences) -> Network -> Settings. Set to "Manual proxy configuration", fill in the SOCKS Host with "localhost" and the port with "8080". Leave the "HTTP Proxy" field blank. Press OK.


That's it. You should now have full web access through Firefox over the SOCKS proxy via the SSH tunnel!

If your restricted machine is running Windows, then you need PuTTY to make the SSH connection. Put the IP address of your unrestricted machine for "Host name (or IP Address)", then go to Connection -> Data in the tree menu on the left, and put the username for the unrestricted machine in the "Auto-login username" field. Finally go to Connection -> SSH -> Tunnels, put 8080 in the "Source port" field, select the "Dynamic" radio button, and hit the "Add" button. Press "Open" to open the SSH connection to the unrestricted machine. Firefox should now have full web access via SSH.



* It's only $5/month for a digital ocean VPS. These work brilliantly for SSH tunnelling, as they have SSH access set up by default, and Digital Ocean is currently not charging for excess bandwidth. Here's my referral link for your convenience: https://www.digitalocean.com/?refcode=d7616f10aa59
If you use this link, I'll get $25 once you've spent $25 after signing up.

Thursday, 12 June 2014

Facebook pushing more boundaries

Facebook used to look at your activity on Facebook and store this and analyse this and use this to show you 'relevant' adverts.

As of now, Facebook admits to looking at your activity across various websites, stalking your every move in an attempt to get to know you better, and to show you an ad that you might just click on. Maybe. One day. After all, they need to make money somehow.

For a change, the PCWorld article about this change is actually less Facebook propaganda heavy than the Slashdot one. It notes the 'extra control given' that Facebook is claiming, but also points out to what extent Facebook is actually watching your activity on the Web:
http://www.pcworld.com/article/2362629/facebook-changes-the-way-it-tracks-and-serves-you-ads.html

This Slashdot contributor however just buys the propaganda completely. You can 'opt-out'. Isn't that reassuring.

It turns out, not so much. You can opt-out on a specific device, using a specific browser. All this does is store a cookie, and Facebook then promises to not use information gathered while the cookie is set to give you adverts (although if they did, it would be very hard to tell). If you use another device, a different browser on the same device, or if you clear your cookies, you'll have to 'opt-out' again.

And you don't opt-out of the tracking itself. Oh no - just out of allowing them to use that information. Nothing about actually storing it. See here to read what I'm talking about and to press that opt-out button if it makes you feel better: https://www.facebook.com/ads/website_custom_audiences/. It'll probably feel a bit like pressing one of those pedestrian buttons at traffic lights in this country: it might not even be wired to anything.

So again, if you want privacy, stay off the 'Net. You're not going to find it around here.

Thursday, 29 May 2014

Rant 2 (Java)

I have been coding mainly in Python for a while now, using Sublime Text instead of an IDE. Recently, I have been looking at Java again, in order to develop Android applications. Now I understand all the benefits of Object Oriented Programming, generalization, abstraction, etc, but after coding pythonically for a while, Java programmers seem to overdo a little bit, sometimes. And by a little bit, I mean a lot, and by sometimes I mean nearly always. It feels like you can't do anything without implementing three abstract classes, extending two helper classes and creating a connector class, with a communicator class, and a helper class for each of these, and then some helper methods to help the helpers help other helpers. Just like big organizations, it becomes very difficult to find out where responsibility is actually being taken, instead of passed on to someone else, and the hierarchies become taller than your average Ent.

A good example is found in a newly created Android Project in Eclipse with the ADT plugin. The project space is seen below:
Hierarchies...
Here we can see a pretty impressive hierarchy. There's a directory called gen for Generated Java Files, it contains a single directory called android.support.v7.appcompat, which contains a single Java class file called BuildConfig.java, which contains a single class called BuildConfig, which contains a single variable (a boolean), which is set to true. Yes, that's a 5 layer hierarchy to store a single boolean constant.

/rant

Saturday, 24 May 2014

Subconscious interaction

There are two types of people in this world, technology natives and technology immigrants. The first have grown up with technology, while the second have been introduced to it. While there are many ways to identify a native, the easiest way is to watch how he or she handles dialogue boxes. Immigrants will always read every word on a dialogue box, sometimes even hovering the mouse pointer underneath the word they are currently reading ("Would __ you __ like __ to __ save __changes __ to __ 'letter to Mr Jones written on 22 May 2014 revision 2 (2).2.docx'"), while natives subconsciously hit the desired option after only a glance at the text and buttons. "How did you know you had to press that button?" I sometimes get asked by an immigrant. I think about it, and work out I have no idea -- it's just a natural action like taking a step forward or chewing a mouthful of food. It's actually more difficult if I consciously think about it.

But then there are some really, really badly worded dialogues. And when I encounter these I feel like an immigrant. For example, once an FNB ATM has given you cash it will display "Select yes if you prefer a receipt for this transaction". What really rankles about this is that I can only imagine how many board meetings went into the design of this one dialogue. "Wait, we need to save paper, so let's try to discourage people from having a receipt. You need to opt-in to getting a receipt, rather than opt-out." "OK, but that's a bit confusing. It needs to be clear, as not everyone will speak English as a first language. Imagine if they got a receipt when they wanted not to. That would be a calamity". "I agree, let's get some more caviar in here. I've just polished this lot off."

Either "Would you like a receipt?" got rejected, or was never thought of in the first place. Every single time I see "Select yes if you would prefer a receipt for this transaction" I have to think about it. I stand in front of the ATM like an idiot, reading and rereading the simple message, while my brain switches over from subconscious to concious control, trying to figure out if I need to press Yes or No. A simple "Would you like a receipt?" or even just "Receipt?" would prevent this, and in an ideal world where that was the case, I would currently have an extra several minutes of my life to spend however I wanted. In fact I would have more time than that, because in addition to the several seconds for each use of an FNB ATM, I would also have the time taken to write this post.

</rant>

Monday, 19 May 2014

Moving files from Dropbox to Mega

Note - I have not actually yet managed to achieve what I set out to do, but below is a good starting point for using the Dropbox and Mega APIs through Python.

My Dropbox 50GB 2 year trial period expires in a couple of weeks' time. Mega however offers 50GB for free with no current time limit.

It makes sense then to migrate from Dropbox to Mega. Unfortunately the number of files I have on Dropbox means that it would be a pain to do this manually. Luckily both services provide APIs, and client libraries exist for Python. It would be nice if one could do this to move all Dropbox files to Mega, maintaining the directory hierarchy.

import mega
import dropbox

m = mega.login(username, password)
d = dropbox.login(username, password)

for f in d.get_files():
    m.upload(f)

Unfortunately, one can't. It's a bit more complicated.

The first step is to head over to the Dropbox website and get some API keys. Go here https://www.dropbox.com/developers/apps, click on Create App and select Dropbox API App.

Choose the following settings:
  • Files and Datastores (What type of data does you app need to store on Dropbox?)
  • No (Can your app be limited to its own folder?)
  • All file types (What type of files does your app need access to?)
You should be taken to a page which contains, among other things, an App key and an App secret. Take note of these.

Install the relevant Python packages:

pip install dropbox
pip install mega

Create a file called dropboxtomega.py and open it in your favourite text editor, Sublime Text. (If this isn't your favourite text editor, give it a try; it soon will be).

The following code loops through all the files in your Dropbox account and saves them to the local folder, perfectly maintaining the directory structure. This isn't what we want to do (if we did, we'd just have used the official Dropbox sync app), but it's a good starting point.

import dropbox
import os
import mega

def recurse_folder(client, path, depth=0):
  folder_metadata = client.metadata(path)
  contents = folder_metadata.get("contents")
  for item in contents:
    if item.get("is_dir"):
      dirname = item.get("path")[1:] # remove leading slash
      print ".." * depth + dirname
      if not os.path.exists(dirname):
        os.makedirs(dirname)
      recurse_folder(client, item.get("path"), depth+1)
          else:
      fpath = item.get("path")
      print ".." * depth + fpath
      f = client.get_file(fpath)
      with open(fpath[1:], 'wb') as out:
        out.write(f.read())

app_key = 'xxxxxxxxxx'

app_secret = 'xxxxxxxxxxxxxxx'

flow = dropbox.client.DropboxOAuth2FlowNoRedirect(app_key, app_secret)

# Have the user sign in and authorize this token
authorize_url = flow.start()
print '1. Go to: ' + authorize_url
print '2. Click "Allow" (you might have to log in first)'
print '3. Copy the authorization code.'

code = raw_input("Enter the authorization code here: ").strip()
access_token, user_id = flow.finish(code)

recurse_folder(client, "/")

If we can save to disk keeping directory hierarchy, we should be able to do the same thing using Mega instead of local storage, right? Right??

Wrong. Unfortunately.

Although the Mega API does provide the functionality to create folders and to save files to specific folders, this doesn't work too well with the library I'm using. Let's leave Dropbox for now and take a look at Mega:

mega = Mega({'verbose':True}) #shows upload progress
m = mega.login("yourname@youremail.com","yourmegapassword")

m.upload("path/to/file.ext")

Looks simple, right? No API keys or access tokens. It Just Works. To create a directory I should be able to do:

m.create_folder("my_folder")

Which works. Then I should also be able to do this:

m.create_folder("my_sub_folder", dest="my_folder")

Which doesn't. It seems to succeed but the folder does not appear. I should also be able to do this:

m.upload("myfile.txt",dest="my_folder")

Which throws a timeout error. Just when things looked like they would be easy.

Although we seem to be having difficulties moving the files and maintaining directory structure, we can still move the files, abandoning our hierarchy. This could be useful if, for example, the script had to run on a machine with less hard drive space available than the total amount of data stored in Dropbox. The recurse function used to upload is as follows. The biggest disadvantage of this is that it requires that None of your files in Dropbox have the same name, even if they are in different directories. It would be trivial to catch exceptions and append a -1, -2, etc to the end of such files, but that's hacky enough to make even me cringe. Note that even though the response from the Dropbox API seems to be a Python file object, it is in fact a custom REST Response object. The easiest way to ensure the data is a the format needed by the Mega API is to save the object to a temporary operating system file and to upload it that. This does add a lot of unnecessary disk IO, and there may well be a better way of converting the REST object to a Python file object.

def recurse_folder(client, path, depth=0):
  folder_metadata = client.metadata(path)
  contents = folder_metadata.get("contents")
  for item in contents:
    if item.get("is_dir"):
      dirname = item.get("path")[1:] # remove leading slash
      print ".." * depth + dirname
      recurse_folder(client, item.get("path"), depth+1)
    else:
      fpath = item.get("path")
      print ".." * depth + fpath
      f = client.get_file(fpath)
      with open("tempfile", 'wb') as out:
        out.write(f.read())
      try:
        fname = fpath.split("/")[-1]
      except:
        fname = fpath
      with open("tempfile") as f:
        m.upload("tempfile", dest_filename=fname, input_file=f)


Finale
I thought it would be simple. It wasn't.

The easy route out. Download the Dropbox sync app (which you probably have already if you've been using Dropbox). Download the Mega sync app. Once all your Dropbox files are synced to a local Dropbox folder, copy them to a local Mega folder, and allow the Mega app to sync back to the cloud.

Pros: fast, easy, likely to work
Cons: You don't get to mess around with Python and APIs

Your  choice.

Thursday, 3 April 2014

The poor man's 'static' IP

I recently discovered that to access my Calibre ebook collection on the go was as simple as hitting the "Start Content Server" button, and turning my laptop into a private ebook server.

Actually it wasn't quite that simple.

The problem: I wanted this server to run off my laptop, which is behind a router, behind an ISP. The router gives out dynamic IP addresses to the subnet, and the ISP assigns a dynamic IP address to the router.

The solution:

The first part was easy. I logged into the router and reserved an IP address for my laptop's MAC address. So far so good; now I only had one dynamic IP to worry about.

There are a couple of services which offer to take care of this part for you. For example: no-ip.com or dyn.com/dns. But they either want to charge you money or offer you a watered-down free version of their products. So I went back to the drawing board, which for me is generally a blank Python script. First step, getting my public ip address:

from urllib2 import urlopen
my_public_ip = urlopen("http://wtfismyip.com/text").read()
print my_public_ip.strip()

The homepage of wtfismyip.com is worth a look too.

Now we can run this as a scheduled task or cron job, and send it somewhere accessible. An Email? But we don't want to spam ourselves with an email every five minutes. The easiest way to check if the IP has changed is to save it to a local text file, check the current IP against the stored one, and only send the email if it has changed (I haven't done any error checking, but you should probably add some).

import smtplib
from urllib2 import urlopen

fromaddr = 'yourgmailaccount@gmail.com'
toaddr  = 'yourgmailaccount@gmail.com'

with open("ip.txt") as f:
    old_ip = f.read()

current_ip = urlopen("http://wtfismyip.com/text").read().strip()

if current_ip != old_ip:
    with open("ip.txt", 'w') as f:
        f.write(current_ip)
    msg = ('Subject: IP Change\n\n%s' % current_ip)
    username = 'yourgmailaccount@gmail.com'
    password = 'yourgmailpassword'
    server = smtplib.SMTP('smtp.gmail.com', '587')
    server.starttls()
    server.login(username,password)
    server.sendmail(fromaddr, toaddrs, msg)
    server.quit()

And now whenever my public ip changes, I'll get an email notification and be able to still log in to the ebook server.

Actually, I do have a static IP address from digitalocean.com, but I can't use it for the ebook server due to space restrictions. But they offer a VPS at just $5/month, with a 20GB SSD, 1TB bandwidth and 512MB Ram. Also, I get free stuff if you use the link above ;)

So instead of email, my solution was using my favourite web development combo: a Flask app with MongoDB. Obviously this would be overkill if it weren't all set up already, but for interest's sake, it took very few lines of code to simulate the wtfismyip.com functionality, and to insert my public IP into the mongo database when it changed (using a similar scheduled task to that described above, but without the need to send emails.) Then when visiting the /ebookip route, it returns a hyperlink to my laptop's public ip (with Calibre running on port 8080).

# c is the connection to the Mongo database
@app.route("/updateip")
def update_ip():
        c.ips.remove()
        c.ips.insert({"ip":request.remote_addr})
        return request.remote_addr

@app.route("/ebookip")
def ebookip():
        ip = c.ips.find_one().get("ip")
        return "<a href='http://%s:8080'>%s:8080</a>" % (ip, ip)

@app.route("/whatismyip")
def whatismyip():
        return request.remote_addr

Obviously this is horrible from a security point of view. If anyone visits the /updateip route, it assume that my server is now hosted at their IP (until my scheduled task runs again and resets it to mine). The simplest and worst way around this is to rely on security through obscurity and make the /updateip route much longer. I would not recommend this. Better, add user account authentication, which is surprisingly easy in Flask.

And that's that. If nothing else, I hope that at least the Python Gmail script is useful.

Wednesday, 5 February 2014

Computers and Questions

"Computers are useless. They can only give you answers." -- Pablo Picasso

Picasso nailed it philosophically. But computers are even more useless when they try to ask questions. From the vaguely annoying "Do you want to restart your computer now?" (No), through "Do you want to automatically recover your [document/tabs/settings]?" (No), through "Do you want to install updates?" (No), through "Do you want to subscribe to the pro version?" (No) all the way to "Do you want to sign up for Google Plus" (No). I can't think of a single instance when any piece of software has asked me a useful question.

Even worse is stacking several of these questions in a row. Signing into Gmail sometimes gives "Do you want to give us your phone number, for your security and convenience?" (No, I want to read my emails) "Are you sure, a few minutes now could save you hours of frustration in the future?" (Show me my damn email now), "Do you want to sign up for Google Plus?" (No, No, and for the fourth minus one time, No).

And yet you can go further to ensure maximum user frustration. Three or more stacked questions, all blocking, before you show the user what he originally asked for might sound bad enough; removing the "Never ask again" option is one step further. But to reach the levels of many modern apps, remove the "No" button entirely. The "Yes/No" dialogue has become the "Yes/Later" or "Yes/Not now" dialogue. Maybe even these are temporary luxury. How soon before this becomes the norm?







Thursday, 16 January 2014

JQuery Accordion

Today I wanted to implement an accordion effect on a website. Unfortunately I hadn't heard the term "accordion" used in this way before, so it took a couple of badly thought-out Google queries before I chanced upon what I was looking for.
In case I'm not the last human on the planet who is not familiar with JQuery's accordion effect, it refers to text which is displayed and/or hidden by headings - so one can click on a header to expand the body. Probably the best way to work out what I mean is to see an example, so take a look at jqueryui.com/accordion.
It's quite neat, but not exactly what I was looking for. I found a much simpler solution which was exactly what I wanted: see jacklmoore.com/demo/accordion.html. Jack's full tutorial/explanation is at jacklmoore.com/notes/jquery-accordion-tutorial/

About Me

My photo

I'm far away from home in this country called "Europe". I'm studying towards a Master's in Computational Linguistics (I think - this might help: https://xkcd.com/114/). I write about web applications and Python and other things that you may find interesting (considering you got this far).