• Nie Znaleziono Wyników

Collective intelligence in Open Source teams: insights from social network analysis of GitHub

N/A
N/A
Protected

Academic year: 2021

Share "Collective intelligence in Open Source teams: insights from social network analysis of GitHub"

Copied!
1
0
0

Pełen tekst

(1)

Collective intelligence in Open Source teams: insights from social network analysis of GitHub

Dorota Celi´ nska

Faculty of Economic Sciences University of Warsaw

dcelinska@wne.uw.edu.pl

Introduction

The Open Source software license allows the end users to study, modify, and distribute the publicly accessible source code to anyone, and for any purpose. These principles effectively lower costs of participating in projects for volunteers, creating both a friendly environment for beginners, and the signaling opportunities for professionals. Despite being relatively new website, GitHub has recently become the largest repository hosting service related to the development of Open Source software, featuring elements of a social network service. As of Dec 31, 2016, GitHub has more than 15 million of registered users and hosts over 40 million of repositories. The popularity of the service and the relative ease of obtaining data about events that occur among its users make GitHub a rich and promising source of information for researchers.

Collective intelligence is a form of universal distributed intelligence arising from collaboration and competition of many individuals. It measures the group’s ability to solve complex tasks. The different packages of information are processed by individuals through social interactions eventually leading to providing the solution to a cognitive problem. Intuitively, the possibilities for success of Open Source community emerge from the underlying collective intelligence within teams.

Example social networks of GitHub

GitHub is a collaborative repository hosting service that includes social features bringing a new trans- parency to the development project [MBPG14]. The activity of its registered users forms several kinds of social networks:

• The most intuitive one is the network of collaboration among developers within the project repositories. Because of the special model of collaboration in GitHub repository hosting service, this network is embedded in the observable network of users that are granted permission to contribute to the projects, i.e. members.

• The network of followers – users who agreed to be sent notifications about one’s activity on GitHub. The network of followers may serve as a proxy of influence and reputation process in the service.

• Similar to the network of followers, yet of different kind of information provided, is the network of watchers. A watcher is a follower of the particular repository. While notifications stemming from following a developer supply quite a diverse range of information, the alerts for watchers are confined to the activity within one repository (mostly issues, bug fixes, and the collaborations within project).

Dataset

GitHub data is huge and the service is continuously evolving, therefore the complete download of the data is impossible. To minimize the number of missing observations, we use a data set combined from three sources:

• GHTorrent project [GS12, Gou13];

• GitHub Archive project [Gri12];

• our own database obtained by web-scrapping GitHub in 2016.

To combine the data, we used a set of heuristics. The resulting data set contains information about the activity of 10,620,313 users in 42,636,285 repositories . GitHub offers both paid plans for private repositories, and free public repositories on the same account, yet information about private repositories is not available. This explains the difference between the number of users in our sample, and the number of actual registered users.

Structure, rich-clubs and assortativity

Network Number Number Number Number Average Reciprocity

of nodes of unique nodes of edges of self-loops one-sided degree (in %) following

f

3,247,109 597,312 15,944,720 1,139 4.91 8.71%

starring

s

3,697,451 652,911 61,163,902 910,862 16.54 0.31%

forking

m

5,435,202 1,325,182 24,276,564 1,013,221 4.47 0.70%

issues

i

3,466,511 520,293 8,239,025 876,002 2.38 0.25%

pulls

pr

2,263,991 0 4,787,584 1,010,999 2.12 1,02%

comments

c

2,427,575 139,710 7,341,043 502,269 3,02 0.38%

Network Degree Assortativity In-Degree Assortativity Out-Degree Assortativity

r

d

ρ

d

r

in

ρ

in

r

out

ρ

out

following

f

-0.127 -0.009 -0.104 -0.044 0.030 0.098

starring

s

0.006 -0.024 -0.016 -0.027 0.082 0.071

forking

m

-0.036 -0.012 -0.051 -0.061 0.110 0.124

issues

i

-0.476 -0.268 0.154 0.133 -0.144 -0.040

pulls

pr

0.071 0.146 0.067 0.073 0.164 0.163

comments

c

0.033 0.040 -0.004 -0.015 0.138 0.122

Hyperbolic Self-Organizing Maps

• We aggregate diversity with Hyperbolic Self-Organizing Maps (HSOM) over quotient space

• For each network we simulate the distances between nodes with 1000 HSOMS, and then compute the average number of edges supporting given distance

• We also simulate in the same way for 100 random networks for every network considered in the study

For more info:

← How the HSOM works

More about RogueViz (visualization engine) →

Collaboration and similarity

0e+00 2e+06 4e+06 6e+06

0.0 2.5 5.0 7.5

Distance

value

group com fol fork iss pull str

0 5 10 15 20

0.0 2.5 5.0 7.5

Distance

value

group com fol fork iss pull str

0.0 0.2 0.4 0.6 0.8

0.0 2.5 5.0 7.5

overall

density

had_or_not_com no yes

0.0 0.2 0.4 0.6

0.0 2.5 5.0 7.5

overall

density

had_or_not_forks no

yes

0.0 0.2 0.4 0.6 0.8

0.0 2.5 5.0 7.5

overall

density

had_or_not_iss no yes

0.0 0.2 0.4 0.6 0.8

0.0 2.5 5.0 7.5

overall

density

had_or_not_pulls no yes

Discussion and further work

We are currently working on comparing the results obtained with different tilings. We also apply different metrics of distance among developers in the projects.

References

[Gou13] Georgios Gousios. The ghtorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, pages 233–236, Piscataway, NJ, USA, 2013. IEEE Press.

[Gri12] Ilya Grigorik. Github Archive. Accessed Dec 5, 2016.https://www.githubarchive.org/, 2012.

[GS12] Georgios Gousios and Diomidis Spinellis. GHTorrent: Github’s data from a firehose. In Michele Lanza, Mas- similiano Di Penta, and Tao Xie, editors, 9th IEEE Working Conference on Mining Software Repositories (MSR), pages 12–21. IEEE, June 2012.

[MBPG14] Nora McDonald, Kelly Blincoe, Eva Petakovic, and Sean Goggins. Modelling distributed collaboration on github.

Advances in Complex Systems, 17(07n08):14500–14524, 2014.

Interactive version of this visualization. http://coin.wne.uw.edu.pl/dcelinska/en/pages/phd-thesis.html More about RogueViz. http://www.roguetemple.com/z/hyper/rogueviz.php

Acknowledgment. This work was supported by the National Science Centre, Poland, grant DEC-2016/21/N/HS4/02100.

Cytaty

Powiązane dokumenty

Om tussenvoorraden of buffers in een produktieproces zoveel mogelijk te voorkomen, terwijl er toch Just-in-time geleverd wordt, kan er dankbaar gebruik gemaakt worden van

Firmy fintech mogą również skorzystać z umieszczania blokad cyfrowych na kopiach swoich prac, aby zapewnić dodatkowe bezpieczeństwo. Obejście blokad cyfrowych w

D la obydw u społeczności — czeskiej oraz słow ackiej — procesy pow ojenne oznaczały oskarżenie o kolaborację i zdradę oraz postaw ienie przed sądem znacznej

Czy dawałaby się jednak zastosować nie tylko do kultur, o których, głównie za sprawą Marbacha, toczy się rozmowa, ale także i do świata osób, jako właściwego przecież,

Dyfraktogramy rentgenowskie zaczynu z cementu glino- wo-wapniowego po 3 dniach hydratacji w temperaturze 50 °C bez (próbka 3_0_T50_08) oraz z 60% dodatkiem koloidalnej krzemionki

Chrześcijaństwo bułgarskie liczy więc sobie już 1150 lat żywej historii, a bułgarska kultura chrześcijańska to nie tylko tereny obecnej Bułgarii, ale także krajów

Współczesny kryzys małżeństwa rodziny i występujące w niej sytuacje kry- zysowe, których symptomy stają się coraz bardziej wyraźne, mają bezpośrednie przełożenie na

Ramy chronologiczne pracy zamykają się na 1559 r., kiedy to zlikwidowano klasztor franciszkanów, a rok wcześniej prote- stanci przejęli kościół świętojakubski, natomiast