Thursday, December 26, 2013

Creating a Twitter 'Bot on Google App Engine in Python

I've been running a Twitter 'bot from my laptop using Windows Task Scheduler for the past several months, and finally decided that it's time to upload it to a server to run from the cloud. @BountyBot tweets new and interesting bounty questions from Stack Overflow several times per day. By following the instructions below, you can set up your own Twitter 'bot that runs on Google App Engine.

What you'll need to get started:
  1. Python 2.7
  2. A Google App Engine Account
  3. The Google App Engine SDK
  4. A Twitter account (with authentication credentials)
  5. A Python library for accessing the Twitter API
  6. A source of information to tweet about

What is Google App Engine?

Google App Engine is Google's cloud computing platform. It allows you to create web applications and run them on Google's existing infrastructure. GAE supports web applications written in Python, Java, PHP, and the Go programming language.  You can find a lot more information on the Google App Engine page, or keep reading for a quick guide to getting set up and posting a Python app on Google App Engine.

Setting up a Google App Engine Account

You can set up a Google App Engine Account for free.  You're only charged for the resources that your application uses (that is, if you even surpass the free service quotas), so it can be a very good low-cost alternative to traditional web hosting providers that charge a flat monthly rate, particularly for a low-resource application like a Twitter 'bot.

Go to https://accounts.google.com to sign in to the GAE dashboard with your Google account credentials. From there you can create a new application. You'll need to provide an application identifier that will be used in your application's configuration later.

Installing Python 2.7

If you don't already have Python installed, you can just download version 2.7 from the official Python download page. If you already have Python on your machine, the Google App Engine SDK installer (see next section) will check for the correct version for you. If it reports an error, you may need to download the correct version of Python, or re-install Python 2.7 in the default location for your operating system.

Installing the Google App Engine SDK

Google App Engine has its own software development kit (SDK) available for free download that allows you to  quickly get started developing your app. Choose Python from the Downloads page, then download and run the installer for your operating system. This should place a shortcut on your desktop to a program called the Google App Engine Launcher.

Deploying and Testing

Before we get to the Twitter 'bot, let's create a quick test page and deploy it to Google App Engine to make sure everything we've done so far is working. The Google App Engine Hello, World! documentation already shows how to create, test, and deploy an application from the command line. I'm going to show how to do the same simple application from the Google App Engine Launcher desktop program that we downloaded and installed in the last section.

Launch the Google App Engine Launcher program and choose Create New Applicaton... from the File menu.


You can provide a new name, or the same name you provided as an Application Identifier when you created your Google App Engine account earlier. Also provide a parent directory on your system for the project files to be stored, then click the Create button. If you go to the project directory, you'll see that several files were created for you.

  • app.yaml - The configuration file that maps URLs to handler scripts. This file also contains your unique application identifier and a version number that allows you to roll your app back to specific versions from the GAE admin console.
  • favicon.ico - The icon that will be displayed in browser tabs when your app is viewed. The Google App Engine icon is the default.
  • index.yaml - A configuration file that specifies which indexes your app uses in the App Engine datastore. Not used in this application.
  • main.py - A Python script that handles requests using the webapp2 framework.

Select the newly created application in GAE Launcher and click the Run button. If you open the log console you'll be able to see what commands are run.  When the application is running, you'll be able to go to http://localhost:8080/ in your browser and see the program's output. (If instead of a running app you get an error message at this point, you may need to go to Edit > Preferences and set the correct Python path.)

Next, click the Deploy button in GAE Launcher. You'll be prompted for your email address and password, then all of the application files will be uploaded to Google App Engine.  Once this is done, you can visit your application's public URL to view the program's output again. Congratulations, the app is now online!

Getting your Twitter Account Authentication Credentials

Before you can post tweets from your Google App Engine project, you'll need to set up some authentication credentials with your Twitter account.  Sign in to the Twitter Developers page, choose My applications from the menu at the top right, then create a new application.  You'll need to change the application type to Read and Write on the settings tab in order to give the new application access to post tweets to your Twitter account.

You'll need the Consumer key and Consumer secret from the OAuth settings section, and you'll need to create an Access token and Access token secret. Be careful! You want to keep these values secret so that other people can't use them to post status updates to your Twitter account.  I keep them in a separate properties file so that they stay out of my source code, and don't accidentally get published where people can access them. You'll see how these values are used in a later section, when we look at the BountyBot code.

Tweeting in Python

You'll need a Twitter API wrapper in order to post tweets in Python. I used tweepy when creating BountyBot because it makes posting a tweet as simple as possible. Once a status message is composed, posting it on Twitter can be done in four lines of code using tweepy, and three of those are for authentication. It doesn't get much simpler. Conveniently, tweepy is also compatible with Python 2.7.

You can download tweepy by following the instructions on the GitHub project linked above. Since it needs to be uploaded to Google App Engine in order to be used by the web app, I just copied the entire tweepy directory into the project directory for my GAE application.

What does a Twitter 'bot tweet about?

Even if it's just for your own personal amusement, you're going to want to give your 'bot something interesting to tweet about. Fortunately, there are a lot of 'bots already on Twitter to look to for inspiration.  There are 'bots that tweet weather updates, breaking news headlines, stock quotes, the price of Bitcoin, and other seemingly random facts.You're really only limited by your imagination and Twitter's 140-character post limit. Check out sites like ProgrammableWeb, Data.gov, and World Bank for thousands of data sets and APIs to use.

Stack Overflow and Bounties

The Twitter 'bot I'm going to use for this demonstration gets its information from Stack Overflow using the Stack Exchange API. Stack Overflow is a question and answer site for programmers. Professional programmers and students post questions about code that they're writing for other programmers in the community to answer. The best answers get voted up, earning reputation points for the person who posted the answer. If a question doesn't get a good answer for some time, a "bounty" of bonus reputation can be placed on the question by anyone who wants to get an answer (provided they have the extra reputation to spend on a bounty).

Bounties last for seven days, unless the person who placed the bounty awards it early. You can view all of the questions that have open bounties on the featured questions tab.  Since Stack Overflow is a very active site (thousands of questions are posted every day), it sees about 60 bounties posted per day on average. This is why it's convenient to have a Twitter 'bot that posts links to only the most interesting bounty questions (as determined by the amount of the bounty and the number of upvotes the question receives).

All of the questions and answers posted on Stack Overflow are accessible through the Stack Exchange API, including a method for returning information about questions with active bounties. The Python code we'll look at in the next section will call this API method to get all of the bounties posted in the past 8 hours.

Putting it all together

Now that all the pieces are in place, we can see how they all fit together. You can take a look at the full code for BountyBot on GitHub, and I'll explain several key points here.

The tweet_bounty.py file contains all of the updated code for BountyBot to run on Google App Engine. It follows the same basic structure as the "Hello, World!" example that we looked at earlier. The script contains a class named TweetBounty that extends the webapp2.RequestHandler class. The get method of this class is configured to handle requests.

The get method queries the Stack Exchange API for the most recently posted bounties, finds the most interesting bounty in that list, formats it into a 140-character (maximum) message, then posts that message as a status update to Twitter.

  • request_bounties - Requests a list of bounty questions from the Stack Exchange API. The most recent bounties are those that will expire in one week, so the time stamps passed to this method form an eight hour window that ends one week from the current time and date.
  • find_max - Loops through the list of bountied questions and returns the one with the highest bounty amount. Upvotes on the questions are used to break ties.
  • format_status_msg - Takes the maximum bounty question and formats it into a 140-character message for posting to Twitter. (Question title, short link to the question, bounty amount, and most relevant tags that will fit in the 140-character limit.)
  • tweet - Takes the formatted status message and posts it to the Twitter account whose authentication credentials are supplied in the settings.cfg file.
The tweet function is where the magic happens, so it's worth taking a closer look at it here.

# Update the Twitter account authorized
# in settings.cfg with a status message.
def tweet(status):
    config = ConfigParser.RawConfigParser()
    config.read('settings.cfg')
    
    # http://dev.twitter.com/apps/myappid
    CONSUMER_KEY = config.get('Twitter OAuth', 'CONSUMER_KEY')
    CONSUMER_SECRET = config.get('Twitter OAuth', 'CONSUMER_SECRET')
    # http://dev.twitter.com/apps/myappid/my_token
    ACCESS_TOKEN_KEY = config.get('Twitter OAuth', 'ACCESS_TOKEN_KEY')
    ACCESS_TOKEN_SECRET = config.get('Twitter OAuth', 'ACCESS_TOKEN_SECRET')

    auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
    auth.set_access_token(ACCESS_TOKEN_KEY, ACCESS_TOKEN_SECRET)
    api = tweepy.API(auth)
    result = api.update_status(status)

The tweet function takes in status as an argument and posts it to Twitter. The first two lines of the function read a configuration file that contain the Twitter authentication credentials we set up earlier, and the next four lines read those values from the file. If we were going to reuse these values it would be worth it to pull this part of the code out into a separate function, but since this script only accesses Twitter once, we can do it all in one place.

The last four lines use the authentication credentials loaded from the file to prove to Twitter that the script has permission to post status messages on the associated Twitter account, then updates the status of that account with the status message passed in as an argument. That's all there is to it.

Scheduling tweets with cron

Now that we've got a web app up and running that can post to Twitter, all that's left is to set up a schedule for when those tweets should be posted using cron. You do this on Google App Engine by creating a cron.yaml file that specifies when you want tasks to be executed. Tasks for BountyBot in cron.yaml have the following format:

- description: daily 1PM tweet
  url: /tweet_bounty
  schedule: every day 13:00
  timezone: America/New_York

The first line is just a description and doesn't change the way cron behaves. The second line is the URL of the script that needs to be run on a schedule. It's a relative URL from the base of the application on GAE. The next line tells cron when to run the script, and the last one specifies what timezone that schedule is based in. If you leave out the timezone, GAE will assume UTC. Since I want BountyBot to post tweets three times per day, I have three entries in my cron.yaml file, one each for 5AM, 1PM, and 9PM in my timezone.

Important: Since scheduled tasks usually do things that you only want to be done on a schedule and not by users visiting the URL of the script (like posting status updates to your Twitter account, for one example), it's important to secure those scripts so that they can only be run by a site administrator (you) and the task scheduler (cron). You can secure a script by adding login: admin to its entry in your app.yaml file.

- url: /tweet_bounty
  script: tweet_bounty.app
  login: admin

You can read more about formatting tasks in cron.yaml in the GAE article Scheduled Tasks With Cron for Python.

Finally...

 If you made it this far, congratulations, you're a Google App Engine expert! Just kidding. This article really only scratches the surface when it comes to Google App Engine, but it does serve as a quick start guide that you can use to get something up and running in a weekend. Be sure to visit the Google App Engine developer's guide to find much more in-depth tutorials, sample code, and videos that explain all the features of the platform. If you run into problems, remember that you can always search or ask the experts on Stack Overflow for some help. Good luck!