This year the 3rd conference on Advances in Social Network Analysis and Mining was held in Kaohsiung, Taiwan. Kaohsiung is the 2nd largest city in Taiwan with around 2.9 million inhabitants. During my short stay there I couldn't help but notice that Taiwan (or at least Kaohsiung) is a blend of Japanese efficiency and cleanliness, with Chinese culture and influences.
The organisation of the conference was impeccable. Everything functioned perfectly, the internet always worked, the guests were well looked after and the presentations all went smoothly. Lunch boxes were provided everyday and the two dinners provided, particularly the banquet served on the 41 floor of Kaohsiung's highest building, the Tuntex Sky Tower (an 85 story skyscraper), was exquisite. The real motor behind the organisation were the student helpers directed by I-Hsien Ting. The students were omnipresent, ever smiling and always ready to help. A truly good job.
These are some noteworthy papers that I came across during the conference (mainly through the sessions I sat through). This is not an exhaustive list by any means. For more information and a summary of each paper consult the conference program on the ASONAM 2011 website.
Matteo Magnani and Luca Rossi. The ML-model for multi-layer social networks. - In one of the best papers in the conference, the authors propose a model to combine the various heterogeneous online personas in a unified network perspective. I believe that the topic of multi layer networks will receive a lot of attention in the near future, making this paper particularly relevant at this point in time.
Chien-Tung Ho, et al. Modeling and Visualizing Information Propagation in a Micro-blogging Platform. - This is another best paper award winner exploring information propagation in micro-blogging systems (using Plurk). The three research questions explored are:- (1) How to quantify a person’s capability to disseminate ideas via a micro-blog. (2) How to measure the extent of propagation of a concept in a micro-blog. (3) How to demonstrate and visualize information propagation in a microblog.
Iraklis Varlamis and George Tsatsaronis. Visualizing Bibliographic Databases as Graphs and Mining Potential Research Synergies - In this paper the authors use power graphs, a graph lossless compression technique developed for biological networks, to visualise bibliographic networks. I see a lot of potential for power graph visualisation in social networks. This paper is a good idea generator in the visualisation field.
Tomoyuki Yuasa and Susumu Shirayama. - A New Analysis Method for Simulations Using Node Categorizations - This is another interesting paper using visualisation that explores Self Organising Maps to cluster and then visualise similar actors in a network.
Juan Lang and Felix Wu. Social Network User Lifetime. - The key question explored in this research is 'what keeps users engaged and active in social networking sites'.
Michael Farrugia, Neil Hurley and Aaron Quigley. SNAP: Towards a validation of the Social Network Assembly Pipeline. - Some shameless self-promotion here. The main theme of this work is how can we collect a ground truth dataset to validate our social network inference method from electronic data.
Marina Danilevsky et al. SCENE: Structural Conversation Evolution NEtwork - Can you identify someone based on the change in his communication pattern while chatting to someone else? A very interesting question studied using IM data, with initial promising results.
Fergal Reid, Aaron McDaid, Neil Hurley. Partitioning Breaks Communities. - Is a non-overlapping or an overlapping community detection approach for clustering a graph? In this paper the authors use the measure of 'breaking cliques' to evaluate different community detection algorithms on various datasets.
Charles Perez et al SPOT 1.0: Scoring Suspicious Profiles On Twitter - Beyond the great title this paper analyses tweet content to identify suspicious profiles. Interesting analysis.
The conference had also 6 interesting keynote speakers. Two of the keynotes by Arno Reuser and Johnny Engell-Hansen were related to open source intelligence and how social networks can help intelligence services. Philippa Pattison presented research on statistical models (ERGMs). Yutaka Matsuo discussed web mining to develop personal search engines. The prolific author Jiawei Han gave a summary of work from his research group in Illinois on data mining algorithms. The last keynote was by Ming-Syan Chen on information processing in social networks.
Some pictures of the conference are already uploaded on the conference Facebook page
Next year the conference is in Istanbul, Turkey
Our temporal tree rings layout paper got awarded a best paper award at ACHI 2011
Many of the current dynamic network visualisations methods or techniques rely on node-link force-based models that were originally developed for visualising static network snapshots. In this study, we diverge from this traditional layout approach and develop a layout for ego networks that places the time dimension in the foreground, by turning time into an element of shape. In addition to this we develop an interactive system that enables the visualisation of multiple networks simultaneously by employing small multiples. Using the proposed layout and analytical system as a grounding visual structure, we visually characterise dynamic network events in 3 different networks; the evolution of the biotechnology field, a phone call data set and a network of passenger connections of an airline. From this analysis we propose a range of ego network visual motifs that can be used as templates to identify and characterise events that are occurring in a dynamic network.
You can download and read the paper here
How often do you get the feeling that as soon as you drop a player in your fantasy league, his stats sky rocket? All too often I’d say. Well, I set out to do a post mortem analysis on how effective my add / drop decisions were now that the regular NHL season is over.
The aim of the visualization is to plot the performance of each player that ever was part of the fantasy team, highlighting the period when he was part of the team. The visualisation plots the total number of fantasy points of each player per week (using a sliding window weighted average to smoothen the curves). If the player was on the roster during that time period then the line is highlighted in red. If the player was not on the roster then there is only a blue line (and the red line on 0). Injuries are shown with a discontinued line (i.e. the player did not play any game that week).
The visualisation is aimed to help analyze player adds and drops, to see where the player was picked up based on his performance. For instance one can see examples of good decisions - adding Thomas Vanek towards the end of the season and dropping Paul Stastny towards the end of the season. From the charts one can also see the variance of each player with mountains and valleys (Gaborik, Havlat,Sharp) and the respective consistency of other players (Henrik Sedin, Christan Ehrhoff and Niklas Kronwall).
The data extraction, from the yahoo sports website, was automated using Excel’s data capture and a few macros. The visualisations were generated using Google charts.
This year the VAST 2008 keynote was delivered by Christian Chabot, CEO of Tableau Software. The main item on the agenda was to proof that Information Visualization is about to explode in popularity. Explode along the lines of the way Adobe products like Photoshop and Acrobat exploded. Explode in a sense that anyone who processes data, from huge companies, to small companies, to Joe the plumber, will be using visual analytic software.
Chabot made some very strong, at times ironic, at times provocative, comments about the state of information visualization. The main focus was about visualization in the industry, where his product is targeted. He showed some neat demos using Tableau, mainly stressing the simplicity of using a “traditional” visualization and interacting with it to get more information.
The approach he took was to try and dispel some strong myths about information visualization. One such myth was that people use information visualization to find hidden patterns in data. He said that the number 1 reason why people buy Tableau is to save time.
When an analyst uses a visualization to answer a question he typically ends up with another question. The users must then have the ability to answer that new question by either creating a new visualization or refining the original visualization. Since it’s so easy to create, or refine a new visualization in tableau, this alleviates the need to create a single complex visualization.
Another key point was that "Information Visualization is NOT as difficult much as you think". Most problems people are trying to solve on a daily basis can be easily solved by traditional visualizations. I tend to agree with this however that does not mean that we shouldn't try to solve bigger problems.
Some of the comments and argument made by Chabot were quite provocative. I found it quite strange that nobody from the InfoVis audience challenged what he said. I was expecting some sort of reaction which never came.
I cannot help but notice the usual split between industry and academics. It's something that always interests me a lot since I’m in sitting in between the two corners. At one extreme of the spectrum I see purely academic people trying to display a million node graph, without any practical application use. At the other extreme somebody is saying, information visualization is easy, just make it easily accessible and people will use it.
I think one of the nice things about the VisWeek conference is that it brings these two extremes and everything in between in a single room. I believe this is very beneficial for the overall community, both the academic community and industry. There is a huge amount of great work being done in the research community that can be exploited by industry, for the benefit of both parties.
I think one of the strengths of Tableau was that it built on a very solid foundation of Information Visualization, design and usability principles that were based on research. To this effect several papers have been published in this same conference about Tableau. I think more people and companies can benefit from being bridges between academic research and the industry.
Hopefully Chabot will be proved right in his prediction that Tableau will be the new Acrobat in the next 5 to 10 years. From personal experience with using the product, I think if there’s a product out there that is on the cusp of achieving this, then that product is in fact Tableau.
These are some of the interesting points discussed during events I attended. They’re a bit sketchy as they’re meant more as reminders than full blown posts.
- Scatterplot Matrix Navigation (Niklas Elmqvist et al)
- This paper was the winner of the best paper award. I think it might resurrect some interest with matrix scatterplots, and I see potential of using concepts from here in my work. I love matrix views, so needless to say this was a very interesting take on it. Need to look into the paper in further detail though.
- Interaction Costs of InfoVis (Heidi Lam)
- Review of user studies related to interaction costs. Made some valid points on the importance of considering interaction features when evaluating systems. Importance of interaction not being a cost but an aid. Something else to think about during user studies.
- Color in Information Displays (Maureen Stone)
- Tutorial on usage of colour in displays. Reiteration and emphasis on Tufte's principle to DO NO HARM with color. Presented two interesting cases studies on how colour was designed for Tableau and voting Kiosks. The subtle details of the design dependent on the application requirements were well explained in the case studies. Interesting point to follow-up on was the relationship of colour and language. Other points:-
- VAST Challenge Participants discussion (Georges Grinstein, Catherine Plaisant, Mark Whiting et al)
- Importance of creating data sets with ground truth.
- Possibility of automatically judging analytical tasks.
- Relevance of this area to an InfoVis Grand Challenge.
Yesterday was my first day, and also the first time at the VisWeek conference. Needless to say I was a bit overwhelmed by everything going on around, landing in a hotel surrounded by people I was in awe of (some of them are real humans), not knowing anybody and feeling like a Lilliputian amongst giants.
In the evening there was a discussion session about the VAST Challenge from all the participating teams (73 in total). This was the event I was most looking forward to, and the event I was hoping to get to meet some interesting people and make some new friends in the InfoVis community. So we sat down for the discussion and after a brief introduction the participants started making their comments. Comments from the participants started flying out like submachine gun fire, and the analogy isn't entirely out of place. It seemed that all anybody from the audience had to say was criticise the organisers.
I remember that when I was tackling the challenge, I found the dataset interesting, challenging and appreciated the work involved in generating it. Sitting there amongst the audience hearing all these negative things being thrown at the organisers, I almost felt that they were offending me. I went home thinking what a bunch of proud, arrogant, people. Is this the community I want to make part of?
I woke up this morning and I was still thinking about this. (It's 6am when I'm writing this). This morning though, maybe because of the caffeine dose, I started rationalizing. I thought, well, maybe a community needs critics. Maybe to improve something and make it better next year, there have to be people who criticise. Some of the criticism was valid, when you think of it rationally and leave your personal emotions behind. Needless to say though that some of the criticism was not constructive at all and was only a big dick wiggling exercise.
Thinking about it, I think critics do play a role in a community. Their suggestions can help improve the product each year, and I think that the VAST Challenge is a living proof of this, considering the great progress the challenge made since it started.
Having said this, being constructive, offering suggestions, and adding some sugar coating around the negative comments, doesn't hurt either. Your professional peers were involved in this work, so having some tact and showing appreciation is due. I'm sure that the vast majority of the people do appreciate but letting this appreciation be known is no harm.
Hopefully someday I will be up there doing something for the community, and I will be the one who gets criticised. When that day comes I hope to remember this first experience, and realise that criticism can be important for improving even though it can hurt. There are also people who do appreciate the work and think it's great, but usually these are the quiet ones.
I won an Award for the best node link animation in the VAST 2008 Competition. The competition consisted of a data set of phone calls between the families of the people running a controversial religious organisation, living on an island. The phone calls retrieved from the island's phone company, provided enough data to extract the social network of the families on the island. In addition to this, each phone record had the time of the call, the duration of the call, and the location of the cell tower from where the call was made.
The tool developed with Processing, was designed to allow easy exploration and interactive animation of a dynamic network. The network can be represented at different levels of detail. At an overview level, the whole network can be visualized using a matrix representation. From this overview, interesting detailed parts of the network can be zoomed upon, and explored, using a node-link representation. Finally, the individual nodes can be studied at an instance level.
The award was given due to the "innovative visualizations, excellent analysis, and outstanding functionality demonstrated in the visual analytic environments" shown.
P.S. Guess this explains why I was so quiet in June.
I haven't seen the SecVis site reviewed in the main infovis sites. I think some of my pals in the security industry will find this interesting.
It might also be worth to check out DAVIX, which is a collection of security visualization tools. It allows you to do things like build maps from pcap files, map protocol use in real time across a network, etc.
The burger currency guideline from The Economist
This is one of the first visualization research images I created. It's a composite image of the same cult network, represented in different ways. The whole network is represented as a pixel matrix. Part of the network is represented as a node-link graph diagram. Finally the graph is a detailed view of an individual node in the network.
Processing is an easy to use programming language designed to make creating data visualizations easier. Processing simplifies the syntax of writing programs that draw graphics and use animation. The language was designed to be easy enough to be used by designers, to abstract most of the complication of writing the same functionality in Java. Each Processing application is finally converted into Java and can be either uploaded to a website as an applet or as a standalone Java program.
The new baseball season has just started and like every year the race is on to win the World Series. Baseball is probably the richest sport when it comes to statistical data and analysis, yet for a sport so rich in statistical data a search in the custom google data visualization search engine, and the infovis image search database yielded very few results. These are some of the more interesting baseball visualizations I found around.
Salary vs Performance - Ben Fry, one of the authors of the Processing programming language uses his freely available tool to visualize which baseball teams are spending their money well, and how does each team position changes over the course of the season? The last applet uploaded looks at the teams and their salaries in 2007.
Baseball Visualization Tool - This is a commercial tool that uses a pie chart to guide the manager whether to pull the pitcher or not. The fuller the pie chart the more the pitcher should be changed.
Baseball race - This visualization tracks the progress of each team in a season as the season progresses. The dataset used for this application starts from 1901 and continues till the present day. The data is freely available from Retrosheet, a baseball scores database.
Bivariate Baseball Score Plots - The bivariate baseball score plots present summary information for MLB teams game scores. The scores are visualized using a bivariate baseball score plot with each game being a point in a two-dimensional grid.
Chernoff Faces baseball managers - A visualization coming fresh off the press that uses Chernoff faces to display baseball manager stats. The features of the face like face height, width, nose size, mouth curvature, etc. change according to the values of the attributes they are representing.
Mitchell Report Visualization - In December 2007 a 409 page report was published detailing the use of steroids in Major League Baseball. A social network of connections between players and trainers mentioned in the Mitchell Report was created using Social Action, a tool developed by the HCI Lab of Maryland University.
Dr. Steve C Wang used a data visualization technique called Chernoff faces to display some characteristic of baseball managers in 2007. The technique was developed by Herman Chernoff in 1973, and the idea behind it is to display different data attributes as facial features such as curvature of the mouth, length of nose, direction of eyebrows. In Dr Wang’s graphic, the number of lineups used by the manager is the length of face, width of eyes and ears; the number of pinch-hitters is the width of the hair, and the width of the face. Using this technique one can display many different attributes of a data set in a single face then allow the user to compare the different faces to analyse the data. In fact Chernoff claims that up to 18 data elements can be displayed using this method, allowing the user to visually cluster the data.
How effective are Chernoff faces in conveying information? Maybe the faces do not covey information at first glance, and they need a lot of referencing to the face legend, however I think they make an interesting and fun way of displaying information. The sole fact that this technique made the pages of the NY Times is enough proof of this. I’m sure that if the same data was displayed with bar graphs and pie charts it wouldn’t make any headlines. Most user studies in visualisation take into account the efficiency (speed in answering / accuracy of answer) of the technique, however techniques like Chernoff face maybe aren’t suited for answering questions fast, but they are catchy and media friendly.
Nat Torkington from O'Reilly Radar published an interesting weekly roundup post of the Data Mining and Visualization posts. The most interesting posts mentioned are: Catching a poker cheat with data mining, SNA toolkit for R and a link to a machine learning blog called Machine Learning (Theory)
The title says it all really. Meryl.net published a a very long list of visualization examples, blogs, influential vis people. Worth a look.
In no. 28 there's the Felton Annual report which I was planning to blog about, some time in the future. It's a personal annual report presented in a very creative way.
The numbers speak for themselves – Social Networking Sites are popular all over the world.
- In 90 (79%) countries a major social networking site features in the top 10 sites of that country.
- In 19 of these countries, the social networking site is the highest ranking site in the country – ranking higher than any search engine.
- From a sample of 116 countries only 2 (Taiwan and Vietnam) didn’t include a popular social networking site in the list of the top 100 websites.
The popularity of social networking sites is no surprise, and several statistics (1, 2, 3) have been published about the major social networking sites like facebook and myspace. There are however few reports on the use of these sites by geographic region. The only geographic distributions I came across were from Comscore, the Social Network Sites paper published in JCMC, and ValleyWag .
Using the Many Eyes platform I created three different visualizations of the most popular SN sites used in each country. The data used for determining the country popularity was collected from Alexa ratings. For more information on how the data was extracted see – how to collect geographic website rankings from the internet.
The world map is a colour coded map with each social networking site represented in a different colour. Where data wasn’t available, the country border is not displayed. If you click on a site from the list on the right, the countries that use that site are highlighted.
This second display shows a rectangular table display (treemap) of the data divided either by social networking platform, or by country. To alter between the displays reorder the treemap hierarchy by dragging the ordering on top of the visualization display.
The third visualization shows the ranking of the social network sites, and the number of internet users in each region. In the darker coloured regions, social networking sites ranked higher than other websites. The size of each rectangle is proportional to the number of internet users in the country, the bigger the rectangle, the more users there are.
Which visual representation of the data set do you prefer, and why? Do you think that one of the displays is superior to the others? Can you think of other different ways to present this data graphically? The aim of this exercise is to display some interesting data using Many Eyes and stimulate discussions on the different visualizations and data presented.
If you’d like to voice your comments, comment on the specific visualization by clicking the comment link in the respective visualization. The data used to generate the results is freely accessible on the Many Eyes site. You can use the uploaded data to create other visualizations in Many Eyes. After all, if you reply with a picture it’s like you’re writing a thousand words, isn’t it.
Do you want to know which files are filling your disk? Do it with ease and style with SequoiaViewer. This free small application maps out files on disk and represents them as squares whose size is dependent on the size of the file. Files are bundled together logically according to the containing folder and different colour schemes can be selected to colour code different file types. If you scroll the mouse on the square you'll see the name of the file.
If you like this unusual information representation you might want to check this site: information aesthetics
This article is part of the Tip of the day project