Killing the Fail Whale With Twitter’s Christopher Fry
- 9:30 AM
Christopher Fry is Twitter’s 43-year-old senior vice president of engineering. He runs everything engineering-related at the company. This means he’s the guy whose job it is to make sure Twitter can handle the massive volumes of tweets that flow across its servers every time, say, Miley Cyrus learns a new dance move at a strip club. He’s a big dude — a surfer and sailor — who came to the company from Salesforce. He also did a post-doc in computational neuroscience from Berkeley, where he studied the auditory cortex of zebra finches. WIRED sat down with Fry to talk about how Twitter will continue to grow, what keeps him up at night, and to find out whatever happened to the Fail Whale.
WIRED: Is there anything about the language of song birds that you can apply to engineering at Twitter?
Fry: The interesting thing about bird songs is they’re learned. They’re this example of this complex learned behavior that’s passed down. Actually, a lot of the original work was done here at Berkeley. They studied basically the dialects of birds in the Bay Area. So there are whole maps of white crown sparrows and how their language changes across the geography of the Bay Area.
Once I left academics, I started doing startups and started moving into the technology world. But one of the things I bring to every job is this love of learning. One of the things we did this year was found Twitter University, which is really about creating this locus of learning inside the organization and building a learning organization. We acquired Marakana and got two really great founders to come in and basically build a world class technical training inside Twitter, provided for free. Every engineer could become an expert in Android or iOS. We have all kinds of different programming languages. It’s really been this incredibly fun thing to create. We want Twitter to be able to do whatever we need it to do within three months, the whole organization. The university gives us that ability to adapt and learn.
WIRED: I’d assume you’d want engineers to have ownership of specific projects. Does that mean that, like, you would want your people who are on iOS to know also about Android as well, just to know it?
Fry: You know, it’s generally good if, one, people appreciate what everybody else is doing and, two, have general knowledge and can work around the systems. So, just like any system, if you have too much specialization you get brittle and you can’t change quickly. In a perfect world, everybody would be able to do everything. You obviously have specialists, and specialists are important. But to the extent that our engineers can have a high degree of aptitude in any discipline, it’s good for us. Good for the teams and good for what we need to do.
WIRED: So, do you have people who are working on multiple projects at once?
Fry: We do. It’s interesting. When we were looking at scaling out mobile, we wanted to make sure that we moved away from this one team inside Twitter building mobile products to scaling out mobile across engineering. So, what we did there was train up a bunch of people to work in Android and iOS, and then we took the mobile team and we left sort of a core team intact but put the mobile engineers out onto the different product teams so that we built a mobile capacity across all of engineering. Twitter has a long history of being mobile-first, but we wanted to extend that even more. We make sure every place we’re building a product, we’re building it onto mobile devices. So, part of what we did was, one, bring up experts in whatever it was and then, two, distribute the teams but still keep core teams that focus on the core mobile infrastructure in place. So, that’s the best long answer to your question.
WIRED: We’re hitting the point where more than half the world has a smartphone. People are coming online, many for the first time, in countries where they’re buying things like twenty-five dollar Android handsets. What type of engineering challenges does that pose?
Fry: There’s two or three things that you have to think about. One is, people are used to working on the web where you can know everything that’s happening in real time. One of the strategies you have to take — we’ve taken this and are pretty prepared for it — is building in all the infrastructure so that you have on the web onto your mobile frameworks. This gives you the ability to experiment, the ability to try things out, the ability to iterate quickly. People sometimes think about mobile products as these shipped, static products and web products as very dynamic and pliable. You have to create the infrastructure to have a dynamic and pliable infrastructure in mobile. On the web, you can track every click. To build great products you have to have that insight into mobile.
Generally, not everybody around the world has the latest iPhone or Android device. So you have to basically tailor your product to run well in places where there are lower-end devices, and maybe not as good networks, or even very unreliable networks.
WIRED: Do you engineer for the lowest common denominator?
Fry: You don’t engineer for the lowest common denominator, but you do tailor the product that you deliver to the market you’re going into. So you’ll have a team that’s focused on creating the Twitter experience for that market.
WIRED: I want to talk about scaling and stability. I read something you said that, Twitter was trying to solve its problems by throwing machines at them rather than from an engineering standpoint. Is that…
Fry: Did I say that? I don’t think I said that.
Fry: Twitter definitely has had scaling issues in the past, and one of the opportunities I saw coming into Twitter was both scaling out the infrastructure and scaling out the organization at the same time. Having gone through that at Salesforce, I was able to bring that learning with me. When I think about the infrastructure problems we had, there was a key problem that we had to solve which was decomposing our monolithic code base. We had a monolithic Ruby server and we were able to basically decompose that into a set of services. Then applying Mesos as that layer of indirection gives us a way to pack services onto machines to get higher utilization. We can get reliability and efficiency at the same time on top of faster developer productivity as well.
WIRED: Tell me what Mesos is if you don’t mind.
Fry: Mesos is our version of elastic compute. It sits between the hardware operating system and what developers deploy, so it gives you a scalable way to deploy services to a set of boxes. It becomes like the operating system for a data center, if you will.
WIRED: Other people are using it as well, right?
Fry: Yeah, it’s used outside Twitter. I think it’s used a bunch of places. It’s an open source project…
WIRED: You smiled when you said that. Are you proud that it’s used…
Fry: I am, I am, I am. I think it’s currently used at Airbnb, and I was trying to come up with a list of other ones but I just don’t have a quick list. But it’s used in a bunch of places and it’s a very successful Apache project. Twitter has a long history of giving back to open source, and Mesos is one of our probably biggest open source successes right now, I would say.
Part of the Twitter service itself is the free flow of information, and so I think a lot of people that come to work here have a passion around that. Generally, inside Twitter engineering we prefer things to be open rather than closed, so where we can share we do. So yeah, it ties into the culture of Twitter itself and the product and how we build it.
There are some great benefits to open source. One is obviously you end up building quality into the product because it’s very transparent, everybody sees what’s happening. And then you get contributions back into the project, so then you can create a platform on which people can build new things and you can bring them back into the company.
WIRED: So is the Fail Whale a thing of the past now?
Fry: The Fail Whale is a thing of the past. Actually, this summer we took the Fail Whale out of production. So if you come to Twitter, and there are always gonna be problems, no service is ever perfect. But right now you will see robots instead of the Fail Whale. So the Fail Whale image is not served by Twitter anymore. It had a long history and some of our users feel very connected to it. But in the end, it did represent a time when I don’t think we lived up to what the world needed Twitter to be.
We are a service that people turn to in moments of joy, and also when things are going horribly wrong in the world. So I feel a personal commitment, as does I think does everybody that works here, to having a service that’s available when anyone needs it. And sometimes Twitter may be the only thing that’s working during a flood or during a major disaster. So we’re very committed to being the most reliable service that we can be.
WIRED: Do you view Twitter as a key piece of communication infrastructure?
Fry: I do. When we think of the purpose of Twitter, what we’re able to do, making it so any person in the world can communicate with any other person, connecting all the people on the planet, that is an incredible mission to be on. We’re probably still early in that mission, but that is the goal: that any one person can communicate with every other person in the world.
WIRED: If you say you deleted Fail Whale, then people can’t get on Twitter, it seems like that’s really opening yourself up to criticism.
Fry: We even debated internally whether we would talk about that outside of the company because we’re still going to have issues here. We have had a long period of much more reliable service which gave us the confidence to say, we really feel we’ve made a substantive difference versus just a small change in how the service is operating. There will always be issues with Twitter. When I think about things that keep me up at night, one is the reliability of the service. The other is are our engineers as efficient as they can be? Do we have all the infrastructure to make sure they can rapidly deliver code so that we can iterate on their product quickly? I think we still can. I think there’s a world of innovation that is ahead of us with Twitter, we’ve only scratched the surface and there’s way more to come. Even though we’ve accomplished a lot, I think there’s still a lot to do.
If you’re always fighting reliability fires, you’re not innovating a product. So you have to have that core infrastructure layer in place so you can then make it more efficient and iterate upon it and build great consumer experiences. I think getting reliability in place is the first step towards really doing product innovation. Sometimes, you will feel like they’re in conflict. I don’t feel that way. I don’t.
WIRED: Is that why there have been so many new products coming out recently?
Fry: I do feel like going through the steps of creating a reliable service, getting to scale, making it efficient and then creating this mobile infrastructure where we can rapidly iterate has meant that we’ve been able to do things like MagicRecs and Event Parrot. Those are two of the things that I think really represent a special experience of Twitter because they’re in the moment.
So if you take Event Parrot… it’s sometimes hard to explain what Twitter is, but when Event Parrot’s on your phone, you become the first person in the world, maybe in your network, to know about something that’s happening. So it really brings the news quickly to you and what’s happening in the world. It makes Twitter very accessible. So I think this story of going from reliability to product innovation has let us experiment with things like that.
WIRED: What advice would you give to those tasked with fixing Healthcare.gov to make it more stable and scalable? Are there general principles or practices they should follow to fix a massive product that can’t go down while it’s being fixed?
Fry: I would give the same advice to almost any software organization: stay close to the people who are going to use your product, don’t spend a lot of time writing specifications, try to iterate quickly and get to a v.1 as soon as possible. You’ll want to get your software in the hands of people that will use it. It’s important to get a steel thread of functionality working end-to-end rather than building it out in layers, so work through a single use case that has you build some UI, logic and backend. Almost all software organizations end up fixing the plane while it’s flying.
Yes, MapQuest Still Exists — And It Has an Awesome New Mobile App
- 6:30 AM
You might remember MapQuest from 1999, back when you needed a printer to spit out your route before embarking on a road trip. While many of us have moved on to Google, Waze, or even Bing for our mapping needs in recent years, MapQuest actually still commands 20 percent of the mapping space. But the company is hoping some slick new mobile app offerings for iOS and Android will help the mapping icon become a household name once again.
MapQuest’s reinvented mobile apps don’t waste effort on fancy flyovers or 3-D views of buildings — it focuses on getting you where you need to go, as quickly and reliably as possible. To do that, it employs a clean, easy-to-use layout, and customization features that make the app more efficient. The experience, while not as robust as Google Maps, manages to feel fresh, and offers convenient one-click features for some of the actions you use the most.
A slide out menu on the right side of the app gives you the option to add a variety of layers to the basic map: traffic data, satellite imagery, hotels, food, and gas, among others. You can also personalize it by adding your own options to this menu: “In-N-Out Burger” or “Starbucks,” for example, if you crave burgers or pumpkin spice lattes while you head down the highway.
In the app’s left hand menu, you’ve got more options, including buttons for “Go Home” and “Go to Work.” Input those addresses, and you can find your way to either location in a split second no matter where you are.
When you navigate to a location, the app automatically adds traffic information, showing areas where congestion is in yellow or red instead of the normal bright blue used to highlight your route. It monitors traffic conditions every few minutes. If the app determines that congestion will significantly delay you, and has an alternate route you can use, it gives you a full page pop up alerting you and giving you the option to reroute. It even shows how much time you’ll save, what street you’re going on instead, and your estimated time of arrival at your destination.
As you’re traveling along your route, the app also displays a handy progress bar along the top of the screen showing how far along you are. This bar mimics the same color scheme as your route on the map, so if you are stuck in a block of yellow traffic, a glance at the phone display will tell you you’re almost out of it (or not).
The app also offers thoughtful micro directions along with the standard list of turn-by-turn ones. Things like “Bryant Street is just past Harrison Street. If you reach Brannan, you’ve gone too far,” which are good for your passenger seat navigator to be aware of as they read directions out to you. Of course, you can also just have the app read the directions aloud itself.
The MapQuest app currently sticks to driving or walking directions — you’ll have to use another app if you want biking directions. Transit directions are coming next year. It’s available for iOS and Android today.