Sunday, 30 October 2016

Rethinking the Keyboard

It is well-known that the computer keyboard has been around for a long time, and has not recently undergone any major changes. Although the basic QWERTY layout has been around since the days of typewriters, people are against any change to this design as they are 'used to it'. (They were also used to type-writer, but not many complained when they were given a backspace key that worked).

Since Apple's recent announcements, it is obvious that the time has come to rethink the keyboard. The important points are:

Looking at the keyboard is not actually necessary to type
Having 3D keys instead of virtual keys allows people to more easily work out which keys their fingers are resting on or about to press
Different applications do different things, and therefore it would be nice if different keys could take on different behaviour based on which application was currently in use.

With the above in mind, I make the revolutionary proposal to do away with displays on the keyboard altogether. I know you're immediately wondering how we can cope with only a single display (the monitor), and with no digital display on the keyboard, but consider the following:

New keyboards would have a single row of keys above the number row. I suggest for now that this row should comprise of 13 keys. Twelve of these keys would be called 'function keys', and they would be assigned to different functions in a context-sensitive manner. The F4 key (Function 4 key) would be used to close the currently open window, for example, while the F1 Key would be used to open an entirely unhelpful 'help Wizard' on M$Windoze systems. If a media application is open, then instead of opening an entirely unhelpful 'help Wizard', F1 would instead immediately mute the volume. This would be useful, for example, when you bring your laptop out of hybernation in a public lecture and it resumes playing a film, or other private media that you were watching the night before, at full volume.

The final key would be at the top left of the keyboard, and this would be called the Escape (ESC) key. This key would remove the current focus, (e.g. deselect an image in word processing applications and put the focus back on the last cursor position to resume typing); close JavaScript overlays; and exit insert mode in text editors such as vi (somewhat similar to the arguably easier to press Ctrl + [ combination, but one that is simpler and more beginner friendly).

Advantages of removing the keyboard display:

No need to glance up and down between two displays
Fingers can easily work out which function key they need to press without looking down
One less thing to break -- these function keys would be integrated into the main keyboard. They would not even require their own processor, meaning fewer hardware, software, firmware, and driver issues, as well as creating more environmentally friendly laptops.

Disadvantages of removing the keyboard display:

We'll never get a "I rewrote Doom 1 using only the Touchbar" post on Medium.

Saturday, 3 September 2016

Reclaiming the web in < 100 lines of code

The Internet seems to become less pleasant by the day for those of us who are here primarily to read. Every now and again (i.e. dozens of time per day), I see a URL that points to an article which looks like it might contain some interesting information. I click on the URL hoping to get a nice big piece of text for me to digest, but instead I'm presented with auto-play videos, a JavaScript overlay asking me to subscribe to a newsletter, another JavaScript overlay asking me to use the site's app (obligatory XKCD: App), another JavaScript overlay telling me not to use an adblocker and still another one which thanks me for not using an adblocker after I've told my adblocker to block the previous one... You get the picture.

Today I'm going to describe how you can greatly improve this experience, focusing specifically on news articles from online media, by building a Reader application and a browser extension. The application will transform web pages from looking like the image on the right, to instead look like the one on the left.

Our arsenal

To create our Reader app, we'll use Python and Flask. The browser extension we create is for Google Chrome, although it should be pretty trivial to adapt for Firefox. We'll be using the Newspaper library for article extraction, and we'll write a little bit of HTML and CSS to display our final article as we want to read it.

I assume that you know some basics, and that you have a working version of Python and Pip installed on your system. I don't go into too much depth about how the various components work, so if you have some previous knowledge of Python, HTML, CSS, and JavaScript, you'll find everything below makes a lot more sense. You should be able to piece everything together even without prior experience though.

Setting up

Newspaper, the library we use for text extraction, is primarily a Python3 library. There is a buggy fork for Python2, but I strongly recommend that you use Python3 to take advantage of the maintained version. I therefore assume that your system is set up in such a way that pip invokes pip3 and python points to the python3 interpreter. Adapt the following as necessary if this is not the case. I'm not going to show the extra commands needed to create a virtualenv and install the packages in that. If you feel strongly about this, feel free to adapt as you see fit.

First we need to install Flask and Newspaper. Run the following commands:

pip install Flask

pip install newspaper3k

For the latter, you may have some issues with the installation of the lxml library. GIYF.

Writing the Python code

The core of our app will be a web server that receives a URL from the user, downloads the content from that URL, extracts the text, reformats it, and returns it.

Create a directory for your project and create a file within this directory called reader.py. Add the following code to this file:

from flask import Flask

from flask import request

from flask import render_template

from newspaper import Article

app = Flask(__name__)

@app.route("/read")

def read():

url = request.args.get("url")

a = Article(url)

a.download()

a.parse()

paragraphs = a.text.split("\n\n")

return render_template("article.html", paragraphs=paragraphs, title=a.title)

if __name__ == '__main__':

app.run(port=5000, debug=True)

The first few lines simply import the parts of Flask we'll be using and the Article class from Newspaper, which is all we need to download the article from the URL and perform text extraction on it.

The next line initialises our Flask app. We then see a single route, which will detect traffic going to the "/read" route, and call the function defined directly below it.

Our actual read() function grabs the URL of the desired article from the arguments of the current URL. It initalises an Article object, downloads the content from the URL, does Newspapers magic parsing on it (text extraction is actually a lot more difficult than one might imagine), and splits the resulting text into paragraphs. Finally, it returns an HTML template (which we'll write in the next section), and passes in the paragraphs of the article as well as the article's title as arguments. We pass in a list of paragraphs instead of the whole text chunk as Newspaper gives us text delimited with newline characters, which will be ignored in our HTML. We therefore will re-insert <p> tags between each paragraph in our template (see the next section).

The final part of the script starts up our web application if we are running it locally and turns on debug mode.

Writing the HTML

Now we need to create an HTML template which will form the skeleton of all news articles read through our app. Create a new directory inside your project directory called templates (this name will allow Flask to find your templates, so don't change it). Create a new file inside this directory called article.html. Your project should now have the following structure:

reader

|-- templates

| +-- article.html

+-- reader.py

In the article.html file, add the following code:

<html>

<head>

<title>{{title}}</title>

<style>

body {

font-family: "Helvetica";

max-width: 900px;

padding-left: 20px;

padding-right: 20px;

padding-top: 30px;

margin: 0 auto;

text-align: justify;

}

</style>

</head>

<body>

<h1>{{title}}</h1>

{% for paragraph in paragraphs %}

<p>{{paragraph}}</p>

{% endfor %}

</body>

</html>

This is a Flask template (or more specifically a Jinja2 template). It has the normal structure of an HTML document (starting and ending with <html>, <body>, and <head> tags). We have a few lines of internal CSS which will make our article be displayed in a decent font, create margins on the left and right of the article on screens that are wider than 900px, add some padding so that the text doesn't try creep off the screen, put the text in the middle of the screen, and stretch out the text (fully justify) to give nice vertical lines on the left and right (which many people do not like, so feel free to remove the justify line if you prefer ragged right).

The non-html parts of the above code are enclosed in either double braces {{}} or in the brace-percent combination {%%}. The former are simply placeholders for the arguments that we pass in from our Python code (i.e. the paragraphs and the article's title). The latter defines a control sequence -- in our case, a simple for loop which will loop through each of our paragraphs and add them to the page, opening and closing <p> tags as required.

That's our entire app. Let's test it.

Testing our web application

To see if our app works, navigate to your project directory in terminal or command prompt and then run the reader.py script. To do this, run commands similar to the following (depending on where your project directory is located)

cd git/reader

python reader.py

You should see output similar to Running on http://127.0.0.1:5000/ (Press CTRL+C to quit). Now fire up your web browser and find the URL of a news article you'd like to read (e.g. this one about Mother Theresa: http://www.bbc.com/news/world-europe-37258156).

Navigate to http://localhost:5000/reader?url=http://www.bbc.com/news/world-europe-37258156 (substituting the URL you chose above if it's different). If all went well, you'll see the news article presented in a nice compact form, without any of the rubbish that you would normally have inflicted upon you.

Building a Google Chrome extension

Although our application is already usable, it's not very user-friendly. Each time you want to read an article, you have to copy the URL to the clipboard and then construct the long version as shown above. Instead of this, we want to be able right-click on any URL that we come across while browsing the web, and to easily send that article to our app. To do this, we'll build a Google Chrome extension. A basic Google Chrome extension consists of two parts: a manifest file (JSON), which describes the extension and requests the necessary permissions, and a JavaScript file, which is where the functionality of the extension lives.

Create a new directory called readerExtension and inside this create a file called manifest.json as well as one called script.js.

Inside manifest.json add the following code:

{

"manifest_version": 2,

"name": "Plaintext Article Reader",

"description": "Reformats online news to remove all the gunk",

"version": "1.0",

"permissions": [

"contextMenus"

"background": {

"scripts": ["script.js"]

}

The first few lines simply describe our extension. In the permissions section, we state that we need permission to fiddle with the user's context menus (i.e. the menu that appears when you right click), and in the background section, we point to the script.js script, which will get called automatically by the browser.

In the script.js file, add the following code:

function plaintext(info,tab) {

chrome.tabs.create({

url: "http://localhost:5000/reader?url=" + info.linkUrl,

});

}

chrome.contextMenus.create({

title: "View Plaintext",

contexts:["link"],

onclick: plaintext,

});

We start off by defining a function plaintext() which will create a new tab in the user's browser. This tab will redirect to localhost and add the URL that we receive.

The second part creates a context menu (which Chrome will automatically collapse into the existing right-click context menu for us) and adds a "View Plaintext" section. We use contexts to say that we only want this to appear if the user right-clicks on a link and we use onclick to specify that our plaintext() function should be called when the user selects this option.

Installing the Google Chrome extension

To actually publish this as a proper Google Chrome extension would involve going through a lengthy set of steps (and paying Google $5). However, it's easy enough to set Chrome to use Developer mode and to load unpacked extensions.

In the "omnibox" or address bar of Google Chrome, type . At the top of the page, tick the box that says "developer mode". Then choose "Load unpacked extension" and select your readerExtension directory from the file chooser that appears.

Now you've written a Google Chrome extension and installed it! To try it out, simply visit any web page (preferably an online news site, such as http://bbc.co.uk/news), right click on one of the articles, and click "View Plaintext", which will now appear in the context menu whenever you right click on a link.

All that's left to do is to enjoy online reading again. Note that your local Flask app has to be running in order for the extension to work, so you'll need to run python reader.py from your project directory before browsing the web.

Where next?

Instead of running the Flask application locally, you can run it permanently from a VPS. Digital Ocean will give you a basic VPS for $5 a month (and if you sign up with them using my referral link, I'll get some credit with them that I can use to keep messing around with stuff like this and writing about it). I'm not going to go into detail on how to deploy a Flask application to a server (although I do do so in my book Flask By Example). Another advantage of running the app remotely is that if you're on a mobile device and have a slow Internet connection, the server can download the large version of the page with all the attached JavaScript and CSS and serve you a much smaller version that still contains the important parts (i.e. the text that you want to read).

That's it for this post. Happy building! You can find all the code presented in this post on GitHub at https://github.com/sixhobbits/reader.

Thursday, 25 August 2016

WhatsApp's new Privacy Policy [fixed]

WhatsApp recently updated their privacy policy. To prevent users from getting skittish, they also wrote a blog post explaining how wonderful everything was. I found some mistakes in their blog post, though, so I thought I'd fix it up for them. The original post can be found here: https://blog.whatsapp.com/10000627/Looking-ahead-for-WhatsApp

~~Looking ahead for WhatsApp~~

About those 17 billion dollars we paid for a chat app? Um, we kind of need to make that back again

Today, we’re updating WhatsApp’s terms and privacy policy for the first time in four years, as part of our plans to test ~~ways for people to communicate with businesses~~ making WhatsApp profitable by allowing businesses to contact you in the months ahead. The updated documents also reflect that we’ve joined Facebook and that we've recently rolled out many new features (we’d like you to focus on the new features, instead of the changes to our privacy policy), like end-to-end encryption, WhatsApp Calling, and messaging tools like WhatsApp for web and desktop. You can read the full documents here.

People use our app every day to keep in touch with the friends and loved ones who matter to them, and this isn't changing (Please go ahead and think about just how useful WhatsApp is to you for a moment. You don’t really have a choice but to agree to our new terms). But as we announced earlier this year, we want to explore ways for you to communicate with businesses that ~~matter to you too~~ may be able to finally turn a profit for us, while still giving you an experience without third-party banner ads and spam (depending on your definition of Spam). Whether it's hearing from your bank about a potentially fraudulent transaction, or getting notified by an airline about a delayed flight, or maybe seeing a text message or two that’s actually an advertisement to help us become profitable, many of us get this information elsewhere, including in text messages and phone calls. We want to test these features in the next several months, but need to update our terms and privacy policy to do so (well, maybe “need” is a strong word, but the current ones are a bit inconvenient for us).

We're also updating these documents to make clear that we've rolled out end-to-end encryption (remember to focus on our new features please). When you and the people you message are using the latest version of WhatsApp, your messages are encrypted by default, which means you're the only people who can read them. Even as we coordinate more with Facebook in the months ahead, your encrypted messages stay private and no one else can read them. Not WhatsApp, not Facebook, nor anyone else (History and common sense say that we’ve probably opened up a back door for NSA, but that’s for like terrorism and stuff, so don’t worry about it). We won’t post or share your WhatsApp number with others, including on Facebook, and we still won't sell, share, or give your phone number to advertisers (but we might let them contact you through WhatsApp. Even though they can use your number in the only way that matters, please focus on the fact that they don’t actually possess those 10 digits that you value so much).

But (remember, anything we say before the word “but” doesn’t really count) by coordinating more with Facebook, we'll be able to do things like track basic metrics about how often people use our services and better fight spam on WhatsApp (Please focus on the ‘fight spam’ part, and skip over the ‘tracking’ part. Also please don’t read this piece on how much can be inferred by looking only at metadata from the EFF: https://www.eff.org/deeplinks/2013/06/why-metadata-matters). And by connecting your phone number with Facebook's systems, Facebook can offer better friend suggestions and show you more relevant ads (which will help us make money) if you have an account with them. For example, you might see an ad from a company you already work with, rather than one from someone you've never heard of (not in a creepy way though. Don’t worry. This is all about profit). You can learn more, including how to control the use of your data, here.

Our belief in the value of profiting from private communications is unshakeable, and we remain committed to giving you the fastest, simplest, and most reliable experience on WhatsApp. As always, we look forward to your feedback and thank you for using WhatsApp.

Friday, 19 August 2016

Do what other people are doing, but more meta

A common pattern among the computer science crowd is the desire to find a gap in the market. We've seen people like Mark Zuckerberg receive the same knowledge that we have, and turn that knowledge into money. Many people I know of have gone through approximately the same progression that I did in terms of becoming dissatisfied with academia for being too impractical (is anyone actually going to read that thesis?), followed by becoming dissatisfied with industry for being too uninspiring (yay, I fixed that unit test. Again). These people then start looking for gaps in the market -- waiting for that One Great Idea (tm) to come down from above and strike them between the eyes.

The first thing to realise is that ideas are worthless. As many people have noted, there is no market for ideas, and this is for good reason. They're not worth anything. You can patent an invention, but not a startup idea. Your idea might be good, but it's not going to make money on its own. Your product might be OK, but it's not going to make money unless it's polished and marketed. And as a single developer working on your weekends, you're unlikely to be able to build anything reliable that's also easy to use and which solves an actual problem. And then tell people about it.

Now that we have that out of the way, ideas are still important. And ideas are fun. I have notebooks full of ideas -- some of them I've shared with others for feedback. A select few are in the process of being transformed into code in private git repositories. I enjoy playing around with ideas, even if it's good to keep a healthy scepticism on how successful they'll become.

A good shortcut for finding more interesting ideas than those of other people is through the concept of 'meta'. A meta-thought is a thought about thoughts -- i.e. one of the things that we believe makes us better than the apes. Metadata is data that we keep about other data -- think of that "last modified" column in your file explorer. That's data. Your files are also data. So it's data which is describing data. Wow. Inception. Metaception. Mind == Blown.

But more seriously, as you listen to other people's ideas, try to see a layer behind their idea. Or if you are thinking of an idea, look for the idea behind that. Three quick examples will hopefully clarify this:

People are creating startups. Most of them fail. Some smart people avoid failure by creating startup incubators instead of startups. They buy some cheap warehouse space and offer internet, coffee, and 'mentorship' to other people who want to run a startup. Most of the startups themselves fail, but they still pay their fees to the incubator. And the few that are successful also give a percentage of their shares to the incubator. The incubator isn't hurt by the failures and makes a fortune out of the successes -- all through taking other people's ideas one layer of meta deeper.
People are playing on the stock market and buying crypto-currencies. Some of them make a lot of money and write about their successes to encourage others to try the same. Many others are losing all their money -- they tend to be a bit quieter and keep their heads down. No-one likes talking about them. The people in the game who are reliably making money are either the stock markets themselves (Wall Street is worth a bit), or the ones who are selling data, books, code, and tutorials to the people who want to gamble their money directly. Again, these people are making money on others' successes and not losing it on their failures.
In non-tech circles, people still make money by proofreading, though not very much. If you are part of the minority that has a good understanding of the grammar of your native language, it's easy enough to find clients who are a bit bewildered by exactly how commas and apostrophes work, and who have read the distinction between effect and affect several times and have given up trying to work out when to use which. However the hourly rate for proofreading tends to be pretty miserable. I once attended a three day proofreading course though, and paid the single instructor several thousand ZAR for the privilege. I was one of dozens of people to do so, and the instructor made more money in three days using his proofreading knowledge than many of the attendees would make in their lifetimes with the same knowledge.

Of course, once you start doing this, you might never stop. What about a startup incubator that trains other people to create startup incubators? Or someone who teaches people who to teach? Or someone who writes blog posts like this one? Be careful of the rabbit hole, Alice. People who go down do not always re-emerge.

Saturday, 26 March 2016

Data Science and Higher Education South Africa data

TL;DR
* I'm exploring "data science" and related technologies, including R
* This is a fun "puzzle": http://priceonomics.com/the-priceonomics-data-puzzle-treefortbnb/
* There exist some nice open data sets relating to Higher Education in South Africa

Data Science
"Data Science" is as much of a buzzword as "The Cloud", "Big Data", and "Artificial Intelligence", and many intelligent people will make unidentifiable sounds of contempt when they hear or read it. But like like the other buzzwords mentioned, "Data Science" started out as an interesting idea, which the media, recruiters, and marketing departments ran away with in order to impress various stakeholders and make lots of money.

With an increasing amount of open data sets being made available (see https://en.wikipedia.org/wiki/Open_data), being able to get information from raw data is an an ever-more useful skill to learn. I came across an fun and simple puzzle recently here http://priceonomics.com/the-priceonomics-data-puzzle-treefortbnb/ and decided to use it as as a starting point for learning more about technologies that are useful for data analysis. While Python is normally the first tool I'd turn towards to solve a problem like this, I recently saw some quite impressive work done with R. I was surprised by how easy it was to carry out common data manipulations and visualisations and I wanted to try it for myself.

I won't go into detail in how I solved the puzzle linked above, as Priceonomics use it as part of their recruitment process. But I downloaded R, and messed around with it and the Treefort dataset for an evening and had a lot of fun. Below is a brief write-up on my first experiences with R, and the most interesting graphs from the South Africa education data set I was using. There's also a link to the Excel spreadsheet I used instead of R.

Why not Python?
One of the main reasons I enjoy Python is the intuitiveness of its syntax. If I don't know how to do something using a Python library, I can usually fire up a shell and with a combination of dir() and guesswork work out how to do what I want faster than looking it up on stackoverflow. However, with matplotlib, numpy, and pandas, I always find the opposite. Even when faced with a very basic problem, I often find myself trawling through documentation and examples to work out how to solve it.

While manipulating and plotting data from a .csv file in R, I very quickly got into my Python habits of using trial-and-error and the R help() command. It was very satisfying to read, manipulate, and plot data in a few lines of code. My current impression of R (which will almost certainly change drastically as I use it more), is that it will fit somewhere between M$Excel and Python for me. If I just want to do some really basic calculations, I'll use Excel. If I want to build and maintain a 100+ line programme, that I'll need to use and change for the foreseeable future, I'll use Python. And if I need to mess around programatically with rows and columns, but I don't need to build anything maintainable, R looks like it could be a good compromise between the two.

South African Education Data
There's no shortage of data sets to play with. Cape Town open data (https://web1.capetown.gov.za/web1/OpenDataPortal/) was the first place I looked, but it seems that that initiative was a bit of a let down. While there's some interesting data available, most of it is hugely inconsistent in format, and looks as if it was intended for human consumption instead of for programmatic analysis. I thought education data might be interesting, and I found that the datasets available here http://chet.org.za/data/sahe-open-data were comprehensive and fairly consistent. Unfortunately they're also presented in xlsx format instead of .csv and are not as ideal as the Treefort data set to load directly into R.

I converted them to .csv files and loaded them into R, but I need to spend some more time with R's syntax and libraries to efficiently work with data in non-ideal formats. The pain point was the double headers in most of the data sets. For example, the dataset of enrollments by race looks like this:

I wanted to graph the data by institution, as in the picture below. Getting the specific row that represented each institution in R was straightforward enough, but I couldn't easily find a way to transform the data to use the year as the x-axis, the categories as separate series, and the the numbers as the y-axis. I'm sure it'll seem trivial once I've worked out how to do it, but I decided to play around with the data in M$Excel first so I could have a clear goal in mind before diving deeply into R.

UCT enrollment by race
Interestingly, of all the institutions listed, UCT is the only one to have any crossing lines.

I've used Excel pretty extensively in the past, and even taught an introductory course on it, so it was much easier to clean and manipulate the data and create pretty graphs than working out how to do everything in R. I loaded the simplest datasets from the CHET collection (Race, Gender, and Success) into separate worksheets, created some hacky VLOOKUPs to separate the time and category data by institution, and added some graphs in a separate sheet. A screenshot of the result is below - the big cell at the top is a dropdown that contains all the institutions, and the graphs update dynamically when a new institution is selected.

All data for Rhodes University
I'm not sure what happened to the success rate in 2013 - none of the other institutions showed a similar decline. Hopefully it's a mistake. (My brief lecturing attempt at Rhodes was last year, so it can't be caused by that).

Soon, I'll attempt to replicate the graphs using R, and write a follow up post about how I do it. I'll also extend the data sets I looked at, and if there's anything interesting I'll write a post which focuses on the data instead of the technology used to analyse it.

If you want to play around with the education data and see the graphs for the other institutions, you can download the Excel spreadsheet I built here: https://docs.google.com/uc?authuser=0&id=0ByEENivQuwUBSmNJUXdPbDI2cU0&export=download. The messy VLOOKUPs would probably be enough to have me expelled from any respectable computer science institution, but luckily I don't belong to any. Feel free to write me snarky comments below on how I could have done it in a cleaner way.

Gareth's Tech Blog