Facebook Open Graph Fun

Thursday, April 22, 2010

More detailed instructions about how to access facebook’s new Open Graph (below). Open Graph is an interesting OAuth based mechanism by which facebook is opening their database to “select” third parties and allowing those parties to read FB cookies and automatically connect to FB and read “engagement enhancing” information about the user such as their social graph, their profile, their news feed, the groups they belong to, their pictures (including all that they’ve been tagged in): just about everything FB knows about them. The details are at this URL.

It is not 100% clear to me yet whether giving the third party access to the facebook cookies, but if the techcrunch article is correct, then third parties can read FB cookies, which are all under the domain .facebook.com and all “send for: Any type of connection” including the “lxe” cookie which is the user’s sign-in email address.

To experiment with Open Graph, first log in to facebook…

Then visit the graph API introduction page at http://developers.facebook.com/docs/api and try some of the links.

02_Graph API - Facebook Developers_1271962552594.png

For example the highlighted link takes you to a facebook database object enumerated “19292868552” at a very long URL that encodes both the object (green section) and your (temporary) OAuth credentials (red section).

This connects to the following information:

Oddly, you cannot arbitrarily modify the object ID and successfully retrieve data on the first try. That is if you change the URL from:

.
to

.

before first going to the original link, you will get

{

“error”: {

“type”: “OAuthException”,

“message”: “Error validating application.”

}

Which suggests that the object is specifically authorized in the token. However, if you first visit the authorized token, you can subsequently modify the URL request and substitute a different object ID. Zukerberg’s is “4”.

Facebook has enumerated at least 20 billion objects (the original ID is 19 billion and something), thus chances of finding someone in particular’s data is pretty low, especially given that many (most?) objects return a null result:

{

“id”: “1”

}

Conveniently, you can also substitute a username for the graph object, for example “billgates,” and get their basic profile information.

Profile information includes (if specified as public) relationship status, “interested in” status, birthday, religion, websites, work history, and education. It appears that the “only share certain information” flag is respected and the results are abbreviated based on your graph permissions.

So far it appears that this effectively protects privacy to the extent that the API does not allow anyone to get anything, you at least have to visit the requesting site first. If you jump to one of the example links from the API introduction page for, for example, Bret Taylor’s message (ID: 367501354973) and poke around in that address space, so far I only get (for example ID: 367501354974):

{

“error”: {

“type”: “GraphMethodException”,

“message”: “Unsupported get request.”

}

Aside from the automatic login function and the risk of email address exposure, the controls appear to ensure this isn’t an easy backdoor into Facebook’s database. It is, though a fairly straight forward API to spider the site and get well structured results back.

The privacy and trust issues with offering automatic log-in to third parties presumptively extends the implicit trust that a facebook user gives facebook to all third parties facebook chooses to authorize. I am dubious the majority of facebook users would be comfortable with this: it is not unlike giving a friend a key to your house and then finding out they later decided that they should be allowed to give copies to anyone they trust. You may trust your friends, but does that mean you automatically trust all of their friends? I may trust facebook with my relationship status, but that doesn’t mean I automatically trust everyone with whom they want to do business with the same information.

Further, there is a definite security risk: whenever a person puts data out on the web in anyone else’s hands, whether it be facebook or Google or anyone else, that data is a gift to the recipient. The “Cloud” is built on the value of that data. One cannot assume that billions of dollars of cost to host cloud services is being extended to users out of the goodness of the hearts of venture capitalists. If you don’t own the hardware, you don’t own the data (or whoever possesses the hardware on which the data is stored, owns it: if you’re using a cloud service, that is not you.)

While we may (or may not entirely) be comfortable with trusting the business model of facebook or Google to be enhanced advertising targeting through the information you share (be it your age, gender, religion etc or just a search term), disseminating information to centralized services such as facebook and Google creates irresistibly juicy targets for those that might choose to exploit the value of the information without paying for the construction of the service user’s found worthy of giving it up for in the first place. It is inevitable that these targets will be, and have been, compromised by those with malicious intent. Even the presumptively most technologically savvy company on earth has been catastrophically compromised, though to their credit they have been (apparently uniquely) open about the breach.

Facebook’s Open Graph model extends the security requirements to the third parties they authorize. It is likely that many of these third parties (CNN, Pandora, Yelp , etc.) are not capable of maintaining the same level of security that Google and Facebook do (and even that fails), and by making these sites entrances into facebook’s data store, they become very juicy targets themselves. A crude exploit would be to compromise a third party website, wait for some of the 400,000,000 facebook users to log in, find an old one and parse the user stream for children, then pull a “relative in distress” scam. If the automatic login gives access to the “lxe” cookie with the user’s verified email. Just verifying bulk email addresses costs money – facebook’s 400,000,000 are worth $40,000 in verification charges or $240,000 (approximately) for a complete set.

Putting data in the cloud is always a huge risk: you should assume that you are giving up ownership and control of any data you store on anyone else’s server, whether it be Google or Facebook or Amazon. You have no control over what the company does with the data, little if any recourse if you disagree with what they choose to do, and no control over the life cycle of the data (for example when the company is sold or fails and is parted out) nor over the hardware your data is stored on (which may end up on ebay someday). The only safe approach is to keep any sensitive information on your own hardware, in your own facility, under your own lock and key, secured to the degree you determine appropriate. Use the cloud for data you want everyone on earth to know anyway.

Gessel On…

Facebook Open Graph Fun

Leave a Reply

A week of tweets: 2010-04-25

Facebook Open Graph