入門ソーシャルデータはじめました 1章イントロダクション

作者: Matthew A. Russell,奥野陽（監訳）,佐藤敏紀（監訳）,瀬戸口光宏（監訳）,原川浩一（監訳）,水野貴明（監訳）,長尾高弘
出版社/メーカー: オライリージャパン
発売日: 2011/11/26
メディア: 大型本
購入: 18人クリック: 779回
この商品を含むブログ (42件) を見る

まだ1章なので、突っ込んだ内容はありません。
Twitter APIの使い方とか、データの扱い方
データを加工するPythonモジュールの使い方を紹介がメインです。
とりあえず、本に書いてあるとおり、ソースコードを書いているんですが
困ったことに動かないソースがおおい。。

下準備は以下のとおり。
（Ubuntuだと楽ちんですね。）

sudo apt-get install python-networkx python-nltk python-numpy

○例1-3
こんな序盤で動かなくなるとは。。
TwitterのトレンドAPIの呼び出し方が変わったらしい
http://holidayworking.org/memo/2011/11/21/2/に正解が書いてありました

import twitter
twitter_api = twitter.Twitter(domain='api.twitter.com', api_version=1)
trends = twitter_api.trends._woeid(_woeid=1)
[ trend['name'] for trend in trends[0]['trends'] ]

○例1-11
こちらは、typoとnx.DiGraphの仕様変更？

import networkx as ns
import re

g = nx.DiGraph()
all_tweets = [ tweet for page in search_results for tweet in page["results"]]

def get_rt_sources(tweet):
    rt_patterns = re.compile(r"(RT|via)((?:\b\W*@\w+)+)", re.IGNORECASE)
    return [source.strip()
            for tuple in rt_patterns.findall(tweet)
            for source in tuple
            if source not in ("RT","via")]

for tweet in all_tweets:
    rt_sources = get_rt_sources(tweet["text"])
    if not rt_sources: continue
    for rt_source in rt_sources:
        g.add_edge(rt_source, tweet["from_user"], {"tweet_id" : tweet["id"]}) #)}ではなく})

g.number_of_nodes()
g.number_of_edges()
g.edges(data=True)[0]
len(nx.connected_components(g.to_undirected()))
sorted(nx.degree(g).values()) #values()をつけないと本のような出力にならない