At work I finally got around to doing a project I’ve been wanting to do for a long time: analyze the sharing behavior of a year’s worth of content at Mashable.
It’s no small project. First, a year’s worth of Mashable content must be collected, which ended up being 13,979 articles in total. Next, the author, publish date, headline, and full text of each post must be extracted from each page, which requires a (fortunately simple) custom scraper to be built. Next, the social resonance data of each article must be collected. For this analysis, I collected share counts for Twitter, Facebook, StumbleUpon, LinkedIn, Google+, and Pinterest, plus clicks from Bitly and per-article submissions from Reddit. Once all that data has been collected and structured, it must be analyzed. I found good ol’ Excel pivot tables to be perfect for most of the analysis. Tracking mentions of key topics from the past year, like Gangnam Style, requires the ability to do full-text searches against headlines and article content, so I indexed all of the data from above into ElasticSearch. It performed brilliantly.
By no means the finest infographic ever made, but hopefully not the worst one, either. Enjoy!
For those of you just joining, Stork is an example programming language “course” designed to demonstrate the principles of programming language implementation in 10 “lessons.” This is Lesson 4 in a series of 10, so if you’re just joining now, you may want to take a peek at lessons 1, 2, and 3 to gear up a bit for this post.
Lesson 4 adds variables to Stork, which involves adding statements in addition to the expressions already in the language. The addition of variables provides fodder for some additional (and more interesting) static analysis as well. At the end of this lesson, Stork will be a working interpreter for simple numerical expressions with support for variables. (The variables will become much more interesting over the course of the next couple of lessons, which will add support for functions and control structures.)
$ java com.sigpwned.stork.Stork >>> var x:Int >>> x ERROR: Variable may not have been initialized: x >>> x = 1+2*(3+4) 15 >>> var y:Float >>> y = x 15.0 >>> x = y ERROR: Will not coerce to less precise type: Float -> Int >>> x = (cast Int) y 15 >>> x+y 30.0 >>> ^D $
For those of you just joining, Stork is an example programming language “course” designed to demonstrate the principles of programming language implementation in 10 “lessons.” This is Lesson 3 in a series of 10, so if you’re just joining now, you may want to check out lessons 1 and 2 to gear up a bit for this post.
Lesson 3 covers the basics of compiler design (front end versus back end) and types, plus a very brief preview of static analysis. At the end of this lesson, Stork will be a working interpreter for simple numerical expressions.
(Basic) Compiler Theory
Most developers are familiar with the use of compilers like
gcc — instant program, just add source code — but aren’t familiar with their inner workings. Stork is intended to dispel some of the mystery around compilers, and its far enough along now to start discussing Stork in the greater context of compiler design.
In the most general sense, compilers are simply translators that turn program source code into executable instructions. There are many compilers:
javac, the Java compiler, turns Java code into Java bytecode;
gcc, the Gnu C Compiler, turns C code into native instructions, and so on. There are also similar programs called interpreters that execute program source code directly without first compiling them down to instructions, like
ruby or the subject of this course, Stork. While interpreters are technically different from compilers, the same design principles apply, so the Stork interpreter will serve nicely as a platform for exploring simple compiler design.
Compiler Design: Front End, Middle End, and Back End
At a high level, compilers look like this:
Basic Compiler Design
For those of you just joining, Stork is an example programming language “course” designed to demonstrate the principles of programming language implementation in 10 “lessons.” This is Lesson 2 in a series of 10, so if you’re just joining now, you may want to check out lesson 1 to gear up a bit for this story.
Lesson 2 covers the basics of parsing numerical expressions using the tokenizer implemented in Lesson 1. Evaluation of these expressions will be handled in Lesson 3.
What is Parsing?
If a programming language is a (more) convenient language humans can use to describe tasks to computers, then parsing is the process of turning a program’s tokens into sentences, or “abstract syntax trees” (ASTs), that the computer can understand. For example, consider this simple mathematical expression for the area of a circle with radius 5:
For this program text, the tokens would be
5, and a parser would build the following AST for it:
Clearly, parsing is essentially “sentence diagramming” for a programming language.
This lesson covers how to transform a token stream into a parse tree like the above example. Looking at parse trees — the syntactic relationships among tokens — instead of the tokens themselves will make evaluating those expressions much easier in the next lesson.
For those of you just joining, Stork is an example programming language “course” designed to demonstrate the principles of programming language implementation in 10 “lessons.” This is Lesson 1 in the series, so if you’re just joining now, you haven’t missed much!
What is Tokenization?
If a programming language is a (more) convenient language humans can use to describe tasks and processes to computers, then tokenization is the process of turning a program’s raw program text into words, or “tokens,” that the computer can understand. For example, consider this simple Python program for the factorial function:
def factorial(n): if n == 0: return 1 else: return n * factorial(n-1)
For this program text, the tokens would be:
==, and so on. Looking at tokens — atomic units of program semantics — as opposed to characters makes the next lesson’s topic of “parsing,” or discovering the semantic relationships among the different parts of the program text, much easier.
In a very real sense, the tokenizer defines the vocabulary of the programming language.
There are several questions every developer gets asked sooner or later:
- How do you learn to program?
- Should I learn to program?
- What is a program?
- Where do programs come from?
- What is a programming language?
- Where do programming languages come from?
Most of these questions already have excellent answers online. But while the question of where programming languages come from is the topic of some excellent books, I’ve never been able to find a simple, straightforward, free answer online.
I’m happy/sad this Michael Scott is not the author of that book.
Stork is a simple, free programming language I’m writing in ten steps and documenting in writing, code, and Reddit threads. I think Stork fills that hole, and I hope that you, gentle reader, think that it does, too.
Today, Twitter announced version 1.1 of its API. The announcement included some interesting changes:
- All API requests must now be authenticated. Twitter doesn’t talk to strangers anymore. You have to at least introduce yourself before it’ll talk to you.
- API hits are now counted per-endpoint. Some APIs have more hits hourly and some fewer, purportedly based on endpoint popularity.
- Display Guidelines are now required to be observed. If you display tweets off of Twitter, they must be consistent with Twitter’s visual style or else.
- Pre-installed client applications must be certified by Twitter. Applications that come installed on things like mobile devices must be Twitter tested, Twitter approved.
- Twitter app growth is limited to 100,000 users. Apps are only allowed to have 100,000 user tokens before they’re forced to ask Twitter “please, sir, I want some more?”
Twitter developers in 6 months
In short, Twitter started acting like a business. And the world was shocked and apalled.
There are a whole lot of strong opinions about ORM floating around the internet and elsewhere. When you see so many passionate, conflicting opinions in so many different threads, it’s a pretty clear sign you’re looking at a religious argument rather than a rational debate. And, as in any good religious argument — big endian or little endian, butter side up or butter side down, vi or emacs, Team Jacob or Team Edward — this one has two sides, too.
Still a better love story than Twilight.
Code should be simple. Code should be butt simple. Code should be so simple that there’s no way it can be misunderstood. Good code has no nooks. Good code has no crannies. Good code is a round room with no corners for bugs to hide in.
We all know this. So why does most code suck?
Because it’s written by people who don’t understand the problem they’re trying to solve.
After switching over to Lion, I had to re-enable personal sites to continue working on a side project. Fortunately, it’s just as easy in Lion as it was in Snow Leopard. I’ll walk through the setup process in this article. Note that you’ll need administrator access for a few steps in the process, so you’ll need the admin password.
In Snow Leopard, the quickest and easiest way to get sites set up was to use the “Web Sharing” feature of the OS, and it looks like Lion works the same way.