For a while now, I’ve been meaning to post about standards and how, though we encounter them every day, they remain largely invisible.  Standards grease the wheels that make interoperability and cooperation possible, and for every one, there seems to be a detailed Wikipedia page about it.  Here are three of my favorites.




Look, But Don’t Touch: My New Facebook Strategy

Much has been said about Facebook’s recent changes concerning the privacy of user data. Michael Zimmer and Fred Stutzman provide enlightening details and perspective, and concern is going “mainstream”: it’s in the New York Times. In short, a whole lot of stuff now cannot be private under any circumstances.

But I think the most unsettling thing besides lack of control is, for many people, the uncertainty, the feeling that the privacy ground keeps shifting beneath our feet. With over 50 (and growing) different settings to think about (see the NYTimes infographic) and subtly polysemous terminology (“pages”, “like”, “connections” and so on) it’s hard to know not only what the universe of settings consists of, but what each  settings’ options mean for the sharing of your information. And I say that as a tech (cough) “elite.”  What about the 399M others who aren’t deeply versed in cookies, caches, config files, and related technocrap?  They’re screwed. And when they find out, they become scared and outraged.  I explained to someone recently some of the ways one’s information gets shared, and I got the shocked reply, “You mean when [my son] plays that stupid Mafia game, Facebook gives them my information?”  I wouldn’t be surprised if more people do start to bail out.

So I have a new Facebook strategy for myself and I’m recommending it to others: “look but don’t touch.”  I’ll log into Facebook and see what my friends have posted, maybe comment on or “like” their statuses or photos, but that’s about it. But wait, you might say, if everyone does this, won’t there be no content left in Facebook?  Well, sure, but that’s the point. Like many people, I’m sure, I’m hesitant to leave Facebook entirely, because I do derive a great deal of enjoyment from it. But that enjoyment derives from the social experiences I’ve had there, not from Facebook per se —  til now they’ve just done a remarkably good job at hosting it all.  So when my friends want to host their social experiences at Facebook, I’ll be happy to attend and participate and respond.  But I’ll host my own social experiences elsewhere for now, and if my friends want to be part of them, they can follow me there. I like being public — I tweet a lot!  I have a fully-stocked RSS reader and even occasionally post here on my own blog, am on email and IM and Skype all day, and keep my personal webpage current. I want a fully-connected digital life in which privacy means not invisibility, but rather control.

This approach also includes cleaning out my profile (no links to interests, groups, universities, geographic areas, or unnecessary demographic information), the existence of which is kind of perverse anyway. Who is this information for (besides marketers)?  My friends already know where I have worked and gone to school.  So what’s left?  Am I expected to make a new friend online because we both like This Old House?  Isn’t that the kind of thing newsgroups were good for?  Maybe listing your schools and workplaces is a good way to find old friends/coworkers, but that’s just a shortcut. For nearly everyone (see: giant component), if you click on your friends’ friends long enough, you’ll find everyone you’re looking for.  Remember, you’re already connected (socially) and you can leverage that to get connected (digitally).  Don’t forget that the latter is just a representation of the former.

I think the account closing/cancellation approach is misguided because, even though it accomplishes the goal of keeping one’s own information private, it does so at the expense of depriving oneself of valuable social experiences with one’s friends.  “Look but don’t touch” instead slowly weans people off of Facebook and makes them less reliant on a single gateway to social life online.

Finally, it’s worth pointing out that another reason I’m not deleting/closing my Facebook account is that I’m optimistic they’ll get it right eventually, and I want to be there for it when they do.  Plus, I am geeky-proud of my under-2000 user ID number, having signed up in March 2004. It has elicited compliments from other nerds — I’d have to be crazy to give that up.

Ironic welcome

Why is my daily pageview count today six times larger than it’s ever been?

Because yesterday I had the pleasure of of having lunch with Tyler Cowen during his visit to Cornell’s Behavioral Economics and Decision Research center. It turns out he was kind enough to link to my blog from Marginal Revolution. Welcome new visitors!

Why the ironic welcome? Because it was a post discouraging adding new things to your RSS feed reader. Kind of self-defeating on my part, perhaps, but I hope you’ll stay.

Scaling Social Science with Hadoop

A friend at Cloudera recently invited me to write a post for their corporate blog about how social scientists are using large scale computation.

I’ve been using Hadoop and MapReduce to study some really large datasets this year. I think it’s going to become more and more important and open the world of scientific computing to social scientists. I’m happy to evangelize for it.

One of the ideas that didn’t make its way into the final version is that even though the tools and data are becoming more widely available to laypeople, asking good social science questions — and answering them correctly — is still hard. It’s comparatively easy to ask the wrong question, use the wrong data, draw the wrong inference, and so on, epecially if the wrongness is subtle. As an example, I think the OkCupid blog is interesting, but it’s not social science.

Social science has long been concerned with sampling methods precisely because it’s dangerously easy to incorrectly extrapolate findings from a non-representative sample to an entire population. Drawing conclusions from internet-based interactions can be problematic because the sample frame doesn’t match the population of interest. Even though I learned to make a cigar box guitar from Make Magazine, I don’t assume I know that much about acoustic engineering. Likewise, recreational data analysis is fun, illuminating and perhaps suggestive of how our social world works, but one ought not conclude that correlations or trends tell the whole, correct story. However, if exploring and experimenting with data can spark an interest in quantitative analysis of our social world, then I think it’s all for the better.


Who can you cite?

In a conversation this morning with some of my fellow sociology grad students, we were lamenting the length of the theory / literature review sections of sociology publications. Reading them is tedious, and having to write them puts those of us who do interdisciplinary work at a distinct disadvantage, compared to those in disciplines that favor shorter, timelier papers.

Completely separately, the other day I was reading (Cornell’s own) Steven Strogatz’s excellent New York Times blog making math accessible — and interesting — to non-mathematicians. In the most recent post, he mentioned the well-known, silly formula for bounding socially acceptable age differences in dating: the minimum acceptable age for a dating partner is defined as (n / 2) + 7, where n is one’s own age. *

These two things go together, I promise.

I propose a new heuristic for deciding what previous scholarly works to cite — do not cite any work produced a larger number of years ago than twice your age. It is 2010 and I was born in 1980, so the earliest acceptable work for me to cite is 1950. This is good, because I am now old enough for The Human Group, for example, but bad because I won’t be able to cite Simmel until I’m 80. But maybe that’s not so bad. As of now, younger and mid-career professors can cite most anything post-WWII, which seems pretty reasonable. Many senior faculty can reach back to Weber and Durkheim. Marx, on the other hand, is approaching the event horizon.

The benefits of this are clear. Younger scholars can focus on developing concrete findings but retain the ability to fit those findings into the theoretical developments of the past several decades. By the end of one’s career, in contrast, one gains the flexibility to situate one’s work in the larger context of the full intellectual history of the discipline.

* Note that the inverse function doesn’t necessarily work as the maximum acceptable age for a dating partner, just the age of the person for whom you represent the minimum acceptable age. It’s an open question as to whether acceptability goes in both directions– if a is an acceptable dating partner for b, is b an acceptable partner for a? I don’t think this is true. There are perhaps some age disparities in which either the younger or older partner would be seen as making a normatively unacceptable choice, but the other partner would not. I’d also suggest that (n / 2) + 7 is gender-specific. But I’m not a demographer or a gender scholar, and this isn’t even close to science anyway.

Control Your Impulse Surfing!

Even after a merciless purge, my Google Reader still has over 90 feeds in it, which generates several hundreds of things to read every day. After a quick skimming and culling, there’s at least a dozen or two dozen articles or long blog posts a day I’d like to read. Combine that with the things my Twitter followees post (a higher signal/noise ratio than the RSS feeds) and it’s more than I can responsibly spend time on.

Today I thought of a nifty hack to control my “impulse reading” — things that I read on a whim during a bout of web surfing. It adapts a popular trick from personal finance to control impulse spending, which is to wait 30 days before making a purchase.

When I encounter an article I’d like to read, I open it in a new tab in Firefox and leave it there. Right now I have about a dozen tabs open. Some of them have been there for days. Invariably, when I make my way back through them, I read maybe 1/3 of them. Most of them just don’t seem as interesting anymore.

This has two other benefits. I can play “inbox zero” with my Google Reader, so I don’t feel like lots of things are hanging over my head, unchecked. Each feed is marked as read, whether I read it or not, or open it in a new tab for later. The second benefit is that I “batch” all my recreational reading into a contiguous chunk so that it doesn’t continually interrupt me during the day.

Oprah, Iran and Twitter Growth

I am spending this summer at Microsoft Research in a group that is studying many aspects of Twitter. This post is co-written with Sarita Yardi.

Twitter is all the rage lately. Media personalities in journalism, sports and Hollywood have started tweeting, and the masses have followed. Taking a break from our regularly-scheduled research, we wanted to see what effect these media personalities have had on Twitter’s growth. The answer is: a lot.

twitter account age

We randomly selected about 70,000 Twitter accounts* and plotted them by when they were created. The x (horizontal) axis is the age of the account, so “0” (far left) is the most recent account (as of Tuesday night) and the far right represents accounts created about 1000 days or nearly three years ago.

About six months ago, the number of Twitter accounts exploded. However, two spikes occur: starting about two weeks ago and about 10-11 weeks ago. These coincide with two important media events surrounding Twitter.

The first comprises the celebrity trifecta of Ashton Kutcher, Oprah Winfrey and Larry King. Ashton and Oprah’s accounts were created in January (about 6 months ago, coinciding with the increase in new users), and King followed in March. Steady growth continued throughout the first part of the year, but things became more interesting in April.

On 4/13, Kutcher appeared on CNN and they announced a challenge: who could reach one million followers first? Oprah jumped on this bandwagon by having Kutcher on her show and sending her first tweet on 4/17. This coincides with the first (right-hand) peak. The peak was short-lived, and account creation decreased until nearly two months later when the Iran crisis developed, but that’s another story.

The question is, did Oprah/Ashton/Larry cause the spike in new accounts, or did they go along for the ride?

It is too early to see the after-effects of the Iran peak, but it will be interesting to see whether people who create accounts for news/geopolitical reasons are more or less “sticky” for Twitter than those who joined for entertainment reasons.

This post is sort of an advertisement for our group; several academic papers will be coming out of this summer’s work on Twitter, and we have already released a draft version of “Tweet, Tweet, Retweet” (boyd, Golder and Lotan), which describes Twitter users’ retweeting practices. Stay tuned for more.

* Methodology: We approximated the most recently assigned user ID and generated user IDs uniformly between zero and this maximum user ID. We ignored any user IDs that mapped to a nonexistent or suspended user account.