Open Source software and the network effects

(1)

University of Warsaw

Faculty of Economic Sciences

Dorota Celi´ nska-Kopczy´ nska

Open Source software and the network effects

Ph.D. dissertation

Supervisors:

Prof. Dariusz T. Dziuba, Ph.D.

Department of Business Informatics and Economic Analysis, FES UW Tomasz Kopczewski, Ph.D.

Chair of Microeconomics, FES UW

December 7, 2018

(2)

(3)

Abstract

The Open Source software license allows end users to study, modify, and distribute publicly accessible source code to anyone and for any purpose. The production of this kind of software usually relies on volunteer contributions and leads to the creation of the communities, i.e., groups of people interested in or working on specific projects.

Communities form social networks where innovations are created and distributed.

The empirical research regarding Open Source software usually focuses on extracting the determinants of the Open Source license choice made on an enterprise or a project level. Additionally, substantial literature revolves around explaining the incentives motivating individual developers to contribute to the Open Source projects. Economists focus on the public-good nature of this kind of software. Classical economic analysis perceives Open Source as a kind of discrepancy: an unusual definition of ownership expressed in the license agreements and usual lack of direct remuneration for developers should discourage them from making efforts, thus the source code should not be created. The behavior of developers is quite the opposite. Although Open Source products are also subjects to network externalities, it is a surprisingly rarely mentioned issue in the relevant literature.

Recently implemented, GitHub is currently one of the largest repository hosting services related to the development of Open Source software, featuring elements of a social network service. As of Dec. 31, 2016, GitHub has more than 15 million registered users and hosts over 40 million repositories. Not all repositories on GitHub are intended for collaboration – many are there just for personal or storage purposes.

The aims of this dissertation are twofold. First, we want to introduce a generalized definition of the network effects, which would allow for the inclusion of information generated by the underlying network structures while modeling. Se- condly, we want to investigate the factors which influence the strength of network effects within the Open Source community of GitHub.

In this dissertation, we will analyze the collaboration among Open Source developers in GitHub with a particular focus on the network effect. In the traditional understanding, network effects arise when a consumer’s utility increases with the number of other consumers purchasing this good. Under direct network effects, the number of other consumers directly affects consumer choices; classic examples include telephone or computer hardware. If subject to indirect network effects, a consumer does not gain directly because of the number of other consumers, but because of the increasing market share, there are more incentives to improve the quality of the product itself or provide additional post-purchase services. Indirect network effects may lead to standardization. Consumer expectations regarding the size of the networks play a crucial role in emerging of network effects.

However, the traditional approach barely addresses the “network” origin of the network effects, focusing only on the number of users, regardless of their actual connections. That is why we introduce a redefinition of the network effect separating a potential component (related to the number of users) from the local component (related to the type or quality of connections and the topology of the network). We assume the presence of network effects within Open Source communities.

Our redefinition allows us to analyze network effects in a broader sense. We are no longer interested in the proofs of the network effects; instead, we may redefine the problem what factors influence the strength of the network effects into the inve-

(4)

stigation what the link creation processes are. This is possible because we take into account who connects with whom.

We discuss how diversity and homophily affect the performance of the teams and the creation of links in social networks of GitHub. We combine the classical Social Network Analysis (e.g., assortativity, rich-club, structural characteristics) with the econometric analyses. Since the complete download of GitHub data is impossible, our dataset is combined from three sources: GHTorrent, GitHub Archive, and our data set obtained by web-scraping GitHub. Recently, it was shown that hyperbolic geometry is intrinsic in many real-world networks and especially useful while modeling large scale-free networks based on similarity and popularity. For this reason, we address similarity vs. diversity among developers with non-Euclidean Self-Organizing Maps – especially, the version utilizing Klein quartic. Exponential growth which characterizes hyperbolic geometry affects the visualization and modeling of the neighborhoods in data: objects sharing the same properties tend to be mapped close together. We investigate the impact of developers characteristics on the probability of collaborating with them with the logistic regression model. Addi- tionally, we conduct text mining analyses on the content provided by GitHub users.

We also show that the structures of networks in the service are not independent and find the temporal patterns in developers’ behavior.

Our findings suggest that diversity plays a crucial role in the creation of links among users who exchange information (e.g., in issues, comments, and following networks). On the contrary, similar users establish the connections in networks related to actual coding. Open Source developers are driven by various forms of the network effect, e.g., standardization or lock-in-effects. Reputation and mutuality also play an essential role in choosing with whom to collaborate. We also provide some new algo- rithmic solutions, especially in the area of applying hyperbolic geometry to economic research.

(5)

Streszczenie

Oprogramowanie Open Source gwarantuje publiczny dostęp do kodu źródłowego, umożliwiając analizę, samodzielną kompilację oraz wprowadzanie modyfikacji do kodu przez użytkowników. Jego powstawanie wiąże się ze społecznościami – grupami osób zainteresowanych i zaangażowanych w rozwój projektów. Tworzą one struk- turę sieciową, w której rozprzestrzeniają się innowacje. Współpraca prowadząca do powstania nowego produktu (programu, naniesienia poprawek) jest generatorem innowacji.

Badania empiryczne dotyczące Otwartego Oprogramowania z reguły skupiają się na znalezieniu determinant, czemu firmy lub twórcy projektów wybierają ten typ licencji. Również czynniki skłaniające programistów do współtworzenia upu- blicznionego kodu doczekały się znacznej liczby artykułów. Ekonomiści nierzadko sprowadzają tworzenie Otwartego Oprogramowania do problemu dystrybucji dóbr publicznych, nie jest to jednak podejście, które pozwala wyjaśnić złożoność tego zjawiska. Z punktu widzenia klasycznej ekonomii Open Source stanowi wyzwa- nie: połączenie sposobu definiowania praw własności wyrażonego w licencji z czę- stym brakiem bezpośredniego wynagrodzenia powinny zniechęcać programistów lub promować strategie gapowicza. Obserwujemy natomiast kompletnie przeciwne za- chowania. Mimo że Otwarte Oprogramowanie podlega efektom sieciowym, jest to zaskakująco rzadko podejmowany temat w literaturze.

GitHub stanowi obecnie największy serwis skupiający oprogramowanie Open So- urce. Zapewnia on nie tylko możliwość przechowywania repozytoriów, ale również oferuje elementy serwisu społecznościowego. Na dzień 31 grudnia 2016 roku za- rejestrowanych jest tam ponad 15 milionów użytkowników, pracujących w ponad 40 milionach projektów. Nie wszscy twórcy publikują swoje repozytoria oczekując współpracy z innymi – wiele kodów jest tylko do osobistego użytku lub w formie kopii zapasowej.

W tej rozprawie zostanie przeanalizowana współpraca pomiędzy programistami tworzącymi Otwarte Oprogramowanie ze szczególnym uwzględnieniem kwestii efektu sieciowego. W tradycyjnym rozumieniu efekt sieciowy oznacza sytuację, w której użyteczność czerpana z zakupu dobra (podjęcia akcji) jest rosnącą funkcją wzglę- dem liczby osób, które już kupiły ten produkt (podjęły tę samą akcję). Niestety, tradycyjne rozumienie terminu praktycznie nie korzysta z możliwości i informacji oferowanych przez strukturę sieciową, która ten efekt zewnętrzny generuje. W definicji skupiamy się jedynie na liczbie innych osób, nie zwracając uwagi na zachodzące pomiędzy nimi związki. Potrzeba jest propozycji alternatywnej definicji – takiej, w ramach której będzie możliwe włączenie do modelowania większej informacji pły- nącej z obserwowanych struktur sieciowych. W efekcie sieciowym według definicji używanej w rozprawie można wyodrębnić komponent związany z siecią potencjalną (jest to klasyczne rozumienie terminu), ale również komponent wynikający z do- stępnej informacji i jej jakości odnośnie lokalnej sieci, czyli sąsiedztwa konsumenta w skierowanym multigrafie (ang. multidigraph).

Drugi cel rozprawy stanowi próba odkrycia czynników, które warunkują siłę efektu sieciowego wśród programistów działających w serwisie GitHub. Standar- dowe metody badania sprowadzają się do określenia jego siły w oparciu o dane an- kietowe lub zaobserwowane (zasymulowane) wybory; jednym z istotnych czynników nierzadko jest cena lub inny pieniężny koszt. W przypadku Otwartego Oprogra-

(6)

mowania takie determinanty z reguły nie mają wpływu, wyzwaniem jest chociażby

“wycena” w kategoriach ekonomicznych podjęcia aktywności w projekcie przez wo- lontariusza. Dlatego okazuje się, że drugi cel rozprawy jest nierozerwalnie związany z pierwszym. Dzięki wprowadzonej redefinicji, możemy w łatwy sposób powiązać badanie sieciowe z ekonomiczną analizą. Zakładając obecność efektów sieciowych wśród programistów Open Source (co zostało już pokazane w literaturze) i dostrze- gając, że skoro efekt sieciowy jest uzależniony od liczby i jakości powiązań w sieci, pytanie o czynniki sprzyjające efektowi sieciowemu możemy sprowadzić do problemu, co sprzyja powstawaniu połączeń w sieci.

Korzystając z danych pochodzących z serwisu GitHub, analizuję zachowanie użytkowników w ramach wielowarstwowej sieci. Wydrębiam sieci (warstwy) zwią- zane ze śledzeniem użytkownika, dawaniem mu gwiazdek, kopiowaniem jego repozy- torium (ang. fork ), zgłaszaniem propozycji zmian (ang. pull request ), zgłaszaniem problemów (ang. issues) oraz komentowaniem. Dwie pierwsze warstwy dotyczą przepływu informacji o aktywności użytkownika i tworzenia się jego reputacji; po- zostałe związane są ze współpracą w projekcie programistycznym.

W rozprawie sprawdzam, czy i w jaki sposób różnorodność oraz homofilia wpły- wają na współpracę (tworzenie się sieci) pomiędzy użytkownikami serwisu GitHub zajmującymi się oprogramowaniem Open Source. Do osiągnięcia postawionych ce- lów stosuje zarówno metodę ilościową, jak i jakościową. Wykorzystane techniki analizy obejmują modele regresji logistycznej, analizę sieci społecznych, techniki analizy wielowymiarowej, analizę tekstu oraz uogólnione sieci Kohonena. W większości wy- padków, żeby przedstawić możliwości uogólnienia wyników, modele są walidowane za pomocą symulacji.

Rozprawa ma charakter interdyscyplinarny. Poza dyskusją problemu z teorii mikroekonomii, dużą uwagę poświęcam efektywnym metodom wizualizacji i prze- twarzania danych o dużej objętości. Rozważam m.in. jaka rozmaitość najlepiej nadaje się do modelowania sieci tworzonych przez użytkowników serwisu GitHub.

Otrzymane wyniki sugerują, że najmniej odpowiednie jest wykorzystanie płaszczy- zny Euklidesowej.

Uzyskane wyniki wskazują, że zarówno różnorodność, jak i podobieństwo od- grywają ważną rolę w procesie tworzenia się sieci pomiędzy użytkownikami serwisu GitHub. Różnorodność wspomaga rozwój połączeń w sieciach związanych z wy- mianą informacji (zgłaszanie problemów, komentarze i śledzenie); współpraca w ramach kodu z reguły występuje pomiędzy osobami o podobnych charakterystykach.

W projektach, współpraca pomiędzy odmiennymi osobami częściej niż między po- dobnymi polega na zgłaszaniu błędów oraz problemów w użytkowaniu programu. Na zachowanie programistów ma również wpływ standaryzacja obecna wśród języków programowania. W rozprawie sugeruję nowe techniki badawcze do dotychczas nie- stawianych problemów (analiza sekwencji, uogólnione sieci Kohonena, analiza tekstu z wykorzystaniem geometrii hiperbolicznej). Uzyskane rezultaty są reprezentatywne dla populacji osób tworzących oprogramowanie Open Source w serwisie GitHub.

(7)

Eryk Kopczyński Warszawa, November 13, 2018 Institute of Informatics

Faculty of Mathematics, Informatics, and Mechanics University of Warsaw

Coauthor statement

This statement relates to the following research papers:

[CK17] Dorota Celinska and Eryk Kopczynski. Programming languages in github: A visualization in hyperbolic plane. In Proceedings of the Eleventh International Conference on Web and Social Media, ICWSM 2017, Montréal, Québec, Canada, May 15-18, 2017., pages 727–728.

AAAI Press, 2017.

[KC17] Eryk Kopczynski and Dorota Celinska. Hyperbolic grids and discrete random graphs. CoRR, abs/1707.01124, 2017.

[KCv17] Eryk Kopczyński, Dorota Celińska, and Marek Čtrnáct. Hyperrogue:

Playing with hyperbolic geometry. In David Swart, Carlo H. Séquin, and Kristóf Fenyvesi, editors, Proceedings of Bridges 2017: Math- ematics, Art, Music, Architecture, Education, Culture, pages 9–16, Phoenix, Arizona, 2017. Tessellations Publishing. Available online at http://archive.bridgesmathart.org/2017/bridges2017-9.pdf.

[KC18] Eryk Kopczyński and Dorota Celińska. Virtual Crocheting of Euclidean Planes in a 3-Sphere. In Eve Torrence, Bruce Torrence, Carlo Séquin, and Kristóf Fenyvesi, editors, Proceedings of Bridges 2018: Mathe- matics, Art, Music, Architecture, Education, Culture, pages 551–554, Phoenix, Arizona, 2018. Tessellations Publishing. Available online at http://archive.bridgesmathart.org/2018/bridges2018-551.pdf.

Generally speaking, most of the work above is only loosely related to mgr Dorota Celińska-Kopczyńska’s dissertation. Not counting the basic definitions (hyperbolic geometry) most of them are basically independent. Only one published paper [CK17], and our yet unpublished results concerning self-organizing maps, have been used in the dissertation to a serious extent. These results make a very small part of the dissertation (about 15%) and my contribution there was much smaller than Dorota’s – I have worked only on the technical details and the implementation of aspects directly pertaining to the non-Euclidean geometry (not counting small help with editing). Most of the dissertation is a result of the individual work by the Ph.D. candidate. Specifically, I have no contribution in the parts of the dissertation and papers pertaining to the theory of economics nor econometrics.

All papers listed abouve involve our mutual projects HyperRogue (a game in hyperbolic plane) and RogueViz (a tool for data visualization and data analysis based on non-Euclidean geometry, based on HyperRogue’s engine). The game itself

(8)

[KCv17] is not directly related to the dissertation – most of the programming here has been done by me, although many features are based on the ideas coming from the Ph.D. candidate, Marek Čtrnáct, and (to lesser extent) other people mentioned in the acknowledgments. THe visualization [KC18] is not related to the dissertation. In both papers, Dorota has taken part in the creation and editing of text (in particular she has written the crocheting part of [KC18], used also in the dissertation. She has presented [KCv17] in the Bridges conference during the regular session and during the Family Day event.

It was also her idea to use HyperRogue’s engine for non-Euclidean data analysis and visualization. In particular, it was her idea to distribute the languages used in GitHub in the vertices of a hyperbolic tesselation, based on similarity (here: being used by the same people). This idea has been used in our demo presentation in the conference ICWSM 2017 [CK17]; we have also used our technique to visualize several other datasets, some of which appear in the dissertation. My only contribution here was the design of a simulated annealing algorithm to distribute the languages, which was a decidedly minor contribution – mgr Dorota Celińska-Kopczyńska has collected and processed the data, found a way to determine the language similarity based on this data, and took part in the creation and editing of the paper, which was decidedly a greater part. The result has been presented on ICWSM 2017 by both of us. The results have been also presented in the form of a poster (created by Dorota) in the NetSci 2017 conference.

My contribution in the paper [KC17] are the algorithms for computing distances in the hyperbolic graphs; these algorithms did not have any major relation to the dissertation, and they have not been used there. This work was presented on a local conference Forum Informatyki Teoretycznej 2017 (FIT2017). Other than the idea for the research direction, the Ph.D. candidate has collected, processed, and described data regarding Github and the FIT2017 coauthorship network, and she took part in the text creation, text editing, and creating the presentation.

The Ph.D. dissertation also discusses the use of non-Euclidean geometry for the construction of self-organizing maps (Kohonen’s algorithm). We intend to publish a series of papers based on this research. Similarly as in [CK17], my only contribution here was an efficient implementation of the Kohonen algorithm itself, as well as the technical details (not counting the part of hyperbolic triangulations which has been created for the use in HyperRogue) – the idea (research direction, use of non- Euclidean geometry in Kohonen’s algorithm, use of SOM distance as a measure of similarity), collection and processing of data, statistical treatment of results, data analysis and the writing of the text (of the dissertation) have been performed by the Ph.D. candidate, so her contribution in this work was also decidedly greater.

I estimate the percentage contribution in the published papers as: [CK17] DCK 90%, EK 10%; [KCv17] EK 70%, DCK 25%; [KC17] EK 75%, DCK 25%; [KC18]

EK 85%, DCK 15%. I emphasize that the papers where my contribution was greater are only very loosely related do the Ph.D. dissertation, and only the basic definitions have been used in the dissertation.

Eryk Kopczyński

(9)

Introduction

The Open Source software license allows the end users to study, modify, and distribute the publicly accessible source code to anyone and for any purpose. The creation of this kind of software usually relies on volunteer contributions and is as- sociated with occurring communities, i.e., groups of people interested in or working on specific projects [Ray03]. Communities form social networks, where innovations are created and distributed. By innovation, we mean “the process of commercial- ization of a newly developed or adopted product or practice” [FS97]. In particular, collaboration among developers which leads to the creation of a new product (new software, or improvements to the previous one) can be viewed as a process which generates innovations.

Open Source is widespread and important. An enormous number of people rely on Open Source software on an everyday basis. They may not even realize it.

Android, a smartphone operating system, is derived from Linux. Mozilla Firefox is among the five most widely used web browsers; while Apache HTTP or Nginx serve most of the websites. Open Source software powers cars, backs the security of the transactions and even became a key player in business analytics. It serves as an example of how small incidents may lead to significant consequences in a relatively short time. Initially started by a group of technicians-enthusiasts and accused of spoiling the software market, nowadays Open Source attracts giants like Google, Microsoft or Facebook. At the same time, it remains accessible and democratic.

Anyone can start or join a project; moreover, contributions are not limited to source code writing. Collaboration is also about reporting bugs or requesting features.

In the digitalizing markets, Open Source becomes inevitable. Economics still treats it as something that should not be – an odd creature. The empirical research regarding Open Source software usually focuses on extracting the determinants of the Open Source license choice made on an enterprise or a project level [LT02; BR03;

Kos07; Cro+08]. Additionally, substantial literature revolves around explaining the incentives that motivate the individual developers to contribute to the Open Source projects [HO02; YK03; BR03; KOT14]. Economists focus on the public-good nature of this kind of software. Open Source is perceived as a kind of discrepancy concerning classical economic analysis: an unusual definition of ownership expressed in the license agreements and usual lack of direct remuneration for developers should encourage them to avoid making efforts; thus the source code should not be created.

However, the behavior of developers is quite the opposite. Although Open Source products are also subjects to network externalities [BR03; Pop07; Cel16b], it is a surprisingly rarely mentioned issue in the relevant literature.

Socially connected computing is rooted in the study of corporate portals and groupware [GP14]. A vast amount of research has been conducted on SourceForge, an Open Source repository that started in 1999 [LW05; FG11; MB12]. SourceForge,

(14)

despite its popularity, lacks features to make social ties and keep up with other developers’ updates [Lee+13; Cel16a]. More recently implemented, GitHub is currently one of the largest repository hosting services related to the development of Open Source software, featuring elements of a social network service. As of Dec. 31, 2016, GitHub has more than 15 million registered users and hosts over 40 million repositories. Not all repositories on GitHub are intended for collaboration – many are there just for personal or storage purposes [Kal+14]. Therefore, understanding what makes a developer attractive for collaboration is essential and interesting.

In the traditional understanding, network effects arise when a consumer’s utility increases with the number of other consumers purchasing given good. Under direct network effects, the number of other consumers directly affects consumer choices;

classic examples include telephone or computer hardware. If subject to indirect network effects, a consumer does not gain directly because of the number of other consumers, but because of the increasing market share there are more incentives to improve the quality of the product itself or provide additional post-purchase services. Indirect network effects may lead to standardization. Consumer expectations regarding the size of the networks play a crucial role in emerging of network effects.

Information goods are usually subjects to network effects. Researchers mainly focus on the standardization problems, e.g., which of two competing technologies will constitute a standard. Many studies are also devoted to determining market equilibria in the case of network effects, the number of enterprises staying in the market and the structure of their output. The traditional framework for network effect models is sufficient for tangible durables, however hardly can one perceive software as such product. The traditional model also fails to describe situations in which consumer is not only interested in how many consumers but also who purchased given good; models of local network effects offer mitigating solutions in this case. However, even if “networks” are present in the wording of the term, their application is usually reduced to the number of nodes (the number of participants) in the network.

The aims of this dissertation are twofold. First, we want to introduce a generalized definition of the network effects, which would allow for the inclusion of information generated by the underlying network structures during modeling. Sec- ondly, we want to investigate the factors which influence the strength of network effects within the Open Source community of GitHub.

Although there have been several attempts to redefine the network effects in the literature (the most successful one by Sundararajan [Sun08]), they seem to disconnect from one another. There is a need for a coherent approach which would arrange them. We will distinguish between the potential network effects, which will capture the popularity of the action (preserving the traditional meaning of the term) and the local ones, which will result from the topologies of the networks and the actions among individuals. Our approach relies heavily on the ideas of Sundararajan [Sun08]; our contributions are stating the problem in the language of utilities, generalizing the networks to the multidigraphs, and making the original results obtainable more simply and realistically than it was so far suggested.

Empirical studies on network effects focus on estimating the strength of those effects. The presence of network effects in Open Source communities has already been examined and proven, e.g., by Bonaccorsi and Rossi [Ros06], Popovici [Pop07]

or Celi´nska [Cel16b]. The author finds this problem uninteresting; therefore we will

(15)

not pay particular attention to the proofs of network effects. Instead, we will assume that there are generalized network effects within the Open Source community.

This way we will benefit from the suggested redefinition. Since we suggest that the network effect results from the underlying structures which are shaped by the interdependence of the individuals’ decisions, who connects with whom becomes a crucial stage in the valuation of the network effects . This way, we can redefine a vast part of the problem into link creation research. If we know what drives establishment of links in the network, we also know the factors which influence the strength of the network effects.

This research is exploratory, that is why we do not state a single hypothesis.

Instead, we formulate research questions (if a formal statistical inference is not possible) or auxiliary hypotheses. The spectrum of methods we will use to accomplish our objective includes both quantitative methods, e.g., econometric models, data mining and social network analysis and qualitative ones, e.g., text analysis. We will utilize both primary and secondary data sources. Most of the analysis will be carried out with a dataset combined from three sources: GHTorrent, GitHub Archive, and data collected with our web-scraper.

Our research population will be users of GitHub repository hosting service who contribute to Open Source software. Since GitHub continuously evolves, it is not possible to collect data in a preferably short time. For this reason we will limit the span of registrations to 2008-2014. We will analyze the behavior of volunteers. For short, we will call people who engage in the creation of Open Source software “Open Source developers”. In the same way, “GitHub users” will denote people registered in GitHub. As we will explain later, one can assume that the public repositories in GitHub are by default Open Source. That is why in our context we may use “Open Source developer” or “GitHub user” interchangeably.

In this dissertation, we will investigate how diversity and similarity influence the collaboration among GitHub users who engage in the creation of Open Source software. We will verify the following hypotheses:

1. The degree distributions in the networks considered in this study exhibit power-law scaling behavior.

2. There exist statistically significant correlations among the structures of the networks occurring within developers’ community in GitHub.

3. The reputation proxies positively affect the probability that others would like to collaborate with a particular developer. However, this impact is nonlinear.

4. The reciprocity proxies positively affect the probability that one would get the co-workers (one of the motivational factors for developers is reciprocity).

5. There exists a significant network effect emerging from standardization: the users of the most popular programming languages tend to be more likely to have collaborators.

Hypothesis 1 and 2 are auxiliary and more technical ones; they allow us to investigate the topology of the networks. Since we utilize a new, combined dataset, we need to know whether the stylized facts still hold (Hypothesis 1). Finding out whether the networks created within GitHub are interdependent (Hypothesis 2)

(16)

gives a partial foundation for the later empirical analysis. Hypotheses 3,4,5 relate to the network effect (standardization; Hypothesis 5) or the factors which should influence its strength: reputation and reciprocity.

We will answer the following research questions:

1. To what extent the most influential nodes are shared among the networks?

2. What are the temporal patterns of interactions among GitHub users?

3. What motivational factors emerge from the activity of Open Source developers in GitHub?

4. Do GitHub users exhibit homophily while establishing links? If so, what are the patterns of homophilic relationships?

5. What are the roles of the homogeneous and heterogeneous links in Open Source projects hosted in GitHub?

The answers to the research questions provide a context for the qualitative study.

The literature does not specify what the sequence of events in GitHub is. However, this information is crucial for the modeling. If we know that there is an interdependence in the structures of the networks, and we can identify what type of links is the initial one, we may include the information from one network while analyzing another. However, we should not perceive the qualitative analysis as the inferior to quantitative part of this dissertation: those questions are also interesting on their own.

The remaining of the dissertation proceeds as follows. Chapter 1 will provide the setup for analyzing network effects. We will discuss the evolution of the term

“network effects” or “network externalities”, starting with a brief review on closely related economic terms related to external effects on utility, i.e., Veblen, bandwagon or snob effects. Afterward, we will introduce the initial (traditional) setup for network externalities or effects and present the popular research topics on network effects. We will discuss in detail the limitations of the traditional understanding of the term, both resulting from the neo-classical economic framework and the initial setup suggested by Katz and Shapiro [KS85]. Later we will provide a set of guidelines, which a revised approach in the modeling of network effects should follow.

We will also investigate the available alternatives: from local network effects by Sundararajan [Sun08], Agent-based Computational Economics [WW00; WWW00]

to Discrete Choice Modeling [CS16]. Since one cannot see any of those approaches as complete, we will explore their limitations and advantages and suggest a synthe- sis. The conclusion will emphasize the need for applying social network analysis to a greater extent to studies on network effects.

Chapter 2 will present the Open Source Movement. Open Source is more than the type of a license; it turned into a social movement, which integrates developers with common purposes within communities. Because one cannot understand it correctly without recalling its history, we will outline the crucial events leading to the establishment of the current code of conduct among developers. The second part of the chapter will focus on the definitions. We will explain what Open Source software is, and what are the principles and practices related to its creation process.

To combat some myths around Open Source developers, suggesting that they are

(17)

single, male enthusiasts-anarchists, we will summarize the results of the empirical research on the participants enrolled in this movement, answering the question “who is doing it”.

Next, we move to GitHub. GitHub is nowadays the most popular repository hosting service significantly influenced by Open Source principles. We will highlight how the Open Source principles emerge from the activity of GitHub users. We will show that GitHub with its models of collaboration among developers can be perceived as an example of Open Source community. We will also describe the most popular events in the service together with the social networks emerging from them. Open Source and GitHub are not only interesting for computer engineers;

they are also intriguing from the socio-economic point of view. In summary, we will investigate areas, which make Open Source a challenge for economic modeling.

Although Open Source has participated in the software market for over 20 years, there are still noticeable gaps in its scientific description.

Chapter 1 and Chapter 2 complete the introductory, review part. Chapters 3-6 cover the empirical research. Chapter 3 will present the methodological framework for modeling developers’ activity in GitHub. We will start by describing the activity of GitHub users in the language and notation of multidigraphs. Afterward, we will present the available data sources on GitHub and the way to merge those databases.

Analysis of such large datasets constitutes a difficulty. Not only attempts at the visualization are problematic; given that we work with graphs, we have limited possibilities to reduce the size of the datasets, e.g., by sampling, because observations are not independent. Sometimes, it may be helpful to utilize techniques which translate the similarities in data to spatial relationships. That is why we will present the non-Euclidean geometries and how to introduce them into modeling. As every data source, GitHub has their advantages and drawbacks; therefore we will conclude with a discussion of known threats to validity and their possible impact on the results presented in this dissertation.

Any research is not complete without a thorough descriptive analysis of the se- lected sample. Chapter 4 will present the preliminary analysis of the networks’

characteristics. There are studies which aim at the description of networks from GitHub from the social network analysis point of view (see, e.g., [LRM14]). We introduce a new, combined dataset, so it is crucial to find whether it is a repre- sentative one. The insights from the descriptive analysis also help in justifying the choice of techniques of analysis. While the power-law behavior is a frequent topic in studies on GitHub, the interdependence of the networks is rarely analyzed. It is also not known what the sequential patterns among events in GitHub are.

Chapter 5 will present the results of two studies: a qualitative and a quantitative one. The first one investigates whether the motivational factors known from the literature on Open Source are present among GitHub users. The second study will analyze what makes a developer attractive for gaining collaborators, with particular attention paid to the impact of the network effects.

In Chapter 6, we will revisit collaboration among developers. A common assumption suggests that individuals tend to work with others who are similar to them. Homophily lowers the barriers to effective communication. However, studies on team working and ability of the group to solve complex problems highlight that diversity and heterophilic connections play a critical role during collaboration, allowing for the diffusion of information. Therefore, we will investigate what the

(18)

patterns behind the connections among GitHub users and their roles are. To this end, we will need a measure of similarity. We will utilize modified Self-Organizing Maps (Kohonen Networks) and combine the results with the analysis of textual artifacts.

We will conclude every empirical study with the discussion, both regarding the corresponding results from the literature and the limitations or threats to validity. If possible, we will validate the results with simulations. This dissertation also comes with an additional, online, interactive material – visualizations and the source codes.

All the software used in this dissertation is Open Source.

Acknowledgments

I am grateful to my supervisors: Dariusz T. Dziuba and Tomasz Kopczewski for their support during my work on this dissertation. Especially, thank you for not trying to impose any limits on my ideas, and allowing me to make mistakes or to find my way. Probably, it was the greatest lesson you could give me.

I am also indebted to all the teachers who sparked my interest in the mathematical modeling and computer science. The same goes about my students, who sometimes make me review my perception of econometrics.

Of course, I am also grateful to my colleagues and friends at Warsaw University for fruitful discussions and providing a great atmosphere. Primarily, I would like to acknowledge Pawe l Strawi´nski for his guidance on various stages of my Ph.D. studies.

Many thanks to Natalia Starzykowska and Kateryna Zabarina just for being there whenever I needed, sharing joys and doubts of doctoral studies. I am glad I can rely on you all.

Thanks go to Ramon Ferrer-i-Cancho for his encouragement to get rid of my limiting thinking schemes. You forced me to leave my comfort zone, but in the way I was not afraid. Your belief in me was empowering. Also, to Krzysztof Ziemia´nski for his great sense of humor, listening abilities, a dose of criticism and a bit of advice so I could solve my mathematical or not-so-mathematical problems.

There is a life outside of the Faculty, that is why I would like to thank Kornel Olszewski, Micha l Walczy´nski, Tomasz and Ma lgorzata Weso lowska, and Dorota

˙Zukowska for their reminders about that simple fact. Your “what-ups” were invalu- able.

To my family, for their incredible patience, unconditional love, and support.

Then and now. To my mother for her acceptance and understanding that I sometimes dive into my strange, hyperbolic world. Without you, I would not be where I am. To my husband for his extraordinary navigational skills in the non-Euclidean worlds, which not only led to our meeting but to successful collaboration as well.

Sometimes we lose people too early. I wish they could see the completion of this project. It is not only about my father. Here, I would like to mention my deceased supervisor Miros lawa Lasek, who teamed me from my bachelor degree and guided into the scientific world. Ironically, she admitted to having invited me to Ph.D. studies because of my self-reliance during work. Without doubts, this trait was critical when she suddenly passed away after my first year.

This dissertation results from the five years research process. It is natural that during that period the partial results have been already discussed and presented to the various audiences. I am very grateful to the anonymous referees for their careful

(19)

reading of the earlier versions of the related papers. Many parts of this dissertation have been greatly improved as a result of their insightful and constructive comments.

In particular, partial results reported in this dissertation have been presented during following conferences and seminars:

• Chapter 3: ICWSM 2017, EC & FENS 2017, NetSci 2017, FIT 2017;

• Chapter 4: EuroMed 2016, WIEM 2016, SM&S 2018;

• Chapter 5: ICWSM 2017, WIEM 2017, SM&S 2017, SM&S 2018, NetSci 2017, Sunbelt 2018;

• Chapter 6: NetSci 2018.

I also acknowledge the financial support of the National Science Centre, Poland, grant DEC-2016/21/N/HS4/02100.

Last but not least, I am also grateful to all Open Source community members.

If not for their creativity which fascinated me and their programs I enjoyed, this dissertation would never come into being.

(20)

Chapter 1 Network effects in the context of social networks

The activity of the developers working within Open Source communities forms network structures. Even if this statement is straightforward, its economic description is challenging. The network structures arise in environments in which the actions of the agents are not independent. However, classical economics assumes that individuals behave in the separation of others. Interconnected actions are an emerging area in the economic studies, which means we lack an adequate framework to investigate them, both theoretically (definitions) and empirically (analytic tools). The network effect is an umbrella term for the phenomena which occur among individuals who exchange information in the networks.

Open Source software communities are characterized by the network effects which result both from the consumption and production of this software. Since the literature on the network effects mostly investigates the impact of network effects on the demand side of the market, this chapter will not discuss the consequences of network effects for the supply side; we will briefly characterize them in Chapter 2.

This chapter introduces the definition of network effects. We discuss how the network effect has evolved: from the closely related economic terms (Veblen, snob, and bandwagon effects), through the distinction between “network effects” and “network externalities”, to the development of the most widely used term. After a brief review on topics in the literature on network effects, we present alternative frame- works and methods of modeling network effects, including game theory, Agent-based Computational Economics, and choice-based experiments. Next, we argue, why the traditional understanding of the network effects, in which the value of an effect is a function of the number of individuals who joined the network, is not sufficient.

We outline limitations of the traditional setup and introduce a set of guidelines for the redefinition of the network effects. The chapter concludes with a proposition how to redefine the network effects.

(21)

1.1 The historical perspective on network effects

Although the exact term “network externality” appeared for the first time in 1985 [KS85], relevant differently named phenomena have long been present in the economic literature.¹

Simultaneously to the spread of ideas of Marshall or Jevons, which defined the development of economic thought for at least following 50 years, the works contest- ing the assumptions of homo oeconomicus model emerged. In the primary, homo oeconomicus model, a consumer has complete information and decides whether or not to buy a commodity solely basing on the price, income and their individual preferences. However, it has been long since economists noticed that there are some kinds of goods for which demand fails to be described in such a simplified manner.

Agents may not only possess incomplete information or their expectations may shape their economic behavior, but they do not behave in a separation of other agents.

That is why, before we introduce the network effects, we would like to present the previous approaches to model the interdependent actions of the consumers. This way we answer the question about the origin of the research on network effects.

1.1.1 External effects on utility

Interdependence of economic activity within a group of agents has a long history in the economic thought. Initially, it was described mostly from a philosophical or a sociological point of view (see, e.g., Smith’s ideas on individuals engaging in social comparisons and their “love for distinction” in [Smi59] or the works by Veblen, e.g., [Veb99]). However, it was not included in the standard economic modeling before Leibenstein [Lei50], a pioneer of behavioral economics, provided a setup how to translate the vague sociological constructs into the language of prices, utilities, and demands. He distinguished three types of external effects on the utility: Veblen, bandwagon and snob effects.

Veblen effect. Veblen’s core interest were interdependent preferences. They are affected by the way others consume the goods and defined by the preferences of social groups. In particular, Veblen worked on conspicuous consumption – a consumption of luxuries that is observed by others [Veb99]. Luxury goods are not necessities of life; their purpose is to attain or maintain a given social status. Conspicuous consumption is not limited to the leisure (upper) class. No matter the class, people compare their income, consumption and perceived wealth. Even if Veblen is considered an economist, he rooted his works within sociological and psychological constructs; maybe this is the reason he did not significantly influence the mainstream.

The description we provide here cannot be found explicitly in any of his works; it comes from Leibeinstein’s presentation of Veblen effect [Lei50] in the language of utility and prices.

Leibenstein [Lei50] postulated division of the price into two components: real price and conspicuous price. In the case of goods purchased for purposes of conspicuous consumption consumer’s utility depends not only on the quality of the good

1Note that “network effects” and “network externalities” tend to be used interchangeably in the literature, even if the meanings of both terms are not the same – we will discuss this later.

However, the differences usually are negligible.

(22)

but also on the price which was paid for it. The real price is the price a consumer purchased the good in terms of money. Conspicuous price is the price consumer thinks other people think she paid for the good [Lei50]. The resulting demand curve may be partially positively inclined.

In markets affected by the Veblen effect, the demand does not only depend on the real price of the good, but also on the price which is expected by the consumers (the conspicuous price). Assume that the price of the commodity falls. In the situation, in which the consumers do not distinguish between the real and conspicuous prices, the demand would increase. However, as the price falls, some consumers will also expect a lower conspicuous price and they will reduce their demand.

Snob effect. The snob effect is the extent to which the demand for the good decreases because other consumers also consume the same good [Lei50]. This effect captures love for diversity and willingness to be exclusive. The snob effect is similar to the Veblen effect, but the Veblen effect is the function of price, and the snob effect is dependent on the number of the commodities demanded in the market, regardless of price. In the markets with snob effects, there is a negative correlation between customer’s demand and the total market demand.

We decompose change in quantity demanded by consumers after price reduction into snob and price effects. If every consumer expected no change in the quantity demanded due to the price reduction, the demand would increase. However, in our case, some snobs will react to the change by leaving the market. Their behavior will reduce the total quantity demanded.

The demand curves for snobs are less elastic than the ones in a situation with no snob effects. The rationality implies that the snob effect itself cannot exceed the price effect; the opposite would yield a contradiction. The quantity demanded at a lower price would be lower than the quantity demanded at a higher price, which would suggest that there are snobs who leave the market when there is a reduction in the total quantity demanded, which is not consistent with the snob definition.

However, the snob effect combined with the Veblen effect can surpass the price effect [Lei50].

Bandwagon effect. A similar to the snob effect is the bandwagon effect, i.e., the extent to which the demand for the good increases because other consumers also purchase this good. The bandwagon effect describes individuals’ tendency to herd behaviors and explains phenomena such as fashion trends or social taboos. Con- sumers may prefer to conform to other people, but the bandwagon effect may also stem from the exchange of information. We notice it in traditions or social taboos which may be confined to specific social groups, e.g., eating habits: popularity of Sauerkraut in Slavic regions, guinea pigs in Ecuador or refraining from eating pork in Judaism. Members of the group perform (or refrain from performing) certain actions regardless of their own opinion, just because other people do so. Sometimes they may even unconsciously incorporate the decision making processes of their community and experience them as their own. Even if the mainstream economics consumer has perfect information and makes decisions on her own, while influenced by bandwagon effects, she may for some reasons mimic the actions of others, resign-

(23)

ing from making her own choices. An English phrase capturing bandwagon effect is

“keeping up with the Joneses” [Lei50].²

Let us assume that every consumer has complete preferences, knows the quantity demanded by all consumers collectively at any given price and expects that a fixed amount will be taken off the market at all prices [Lei50].We consider a drop in price.

As a result, the demand increases, and we can decompose this increase into changes related to price and bandwagon effects. The price effect describes the change in the quantity demanded if there is no bandwagon effect and consumers do not adjust to each other’s demand. However, the bandwagon effect occurs, and an additional number of consumers enter the market or increase their demands. Leibenstein [Lei50]

also notices that under the presence of bandwagon effects, the demand curve is more elastic than if the demand is based only on the functional attributes of the commodity.

Veblen effects, snob effect or bandwagons focus on the social norms and at- tempt to translate their impact into the language of utilities. Moreover, we may perceive them as means for group attribution or signaling devices (I belong to the same/different group as you), which in turn influence the interactions between individuals.³ Bandwagon effects which not necessarily relate to social norms or code of conduct and typically arise in the markets for information were given a separate name: network effects (or externalities).

1.1.2 Traditional view on network effect

The common characteristic of the effects described in the previous section is that the well-being of the consumers is directly affected by the actions of other agents in the economy. We will call such situations externalities [MCWG95]. We call a positive externality, an externality favorable to the recipient, and a negative externality is a converse. In a general economic framework, the size of the externality may be mediated, e.g., by bargaining processes, which in turn require means to trade. In the case of telecommunication, information-related technologies or computer industry hardly can we find possibilities of trade; we know what is the value of the information after processing (consuming it) and it is indivisible. Depending on the context, the value of the information may be conditional on the number of other entities who process it. Consumers of information take binary actions. Special types of externalities from the information technology markets, which are of limited possibilities to trade and usually emerge in networked environments were initially named network externalities. Later, economists noticed that this term does not accurately describe the phenomenon, that is why they coined the term network effect.

Although the literature on network effects is quite broad, this section focuses on the papers which in the author’s opinion shaped the most popular understand-

2Herd behaviors or social taboos even if kept rather outside of the mainstream economics made a popular topic in the literature, especially satire. Typical Polish examples tackling problems of conformity and social norms are “Morality of Mrs. Dulska” by Zapolska or Tango by Mro˙zek.

The first one contains an iconic scene of the main heroine forbidding her husband from having an every-day healthy walk because of “what would others say”, making him walk around the table instead. The second one describes a world with no social norms where there is nothing left to oppose.

3Personal communication with Iana Okhrimenko, FES UW.

(24)

ing of the term. In 1985 Katz and Shapiro [KS85] introduced the term “network externality” – a situation in which the utility derived by a consumer of the commodity increases with the number of other agents consuming this commodity.⁴ They provided three possible sources of such positive externality:

1. A direct network externality. The telephone is a typical example. The more households joined the telephone network, the more opportunities to com- municate via phone for a new household, which decides on connecting to the network. Accumulation of users may lead to an increased willingness to adopt a given solution, which in turn may become a standard.

2. An indirect network externality. It arises, e.g., among hardware pur- chasers. A consumer deciding to buy a given type of hardware architecture is usually concerned about how many other consumers have already purchased the same product. The more popular given architecture becomes, the more probable is that the variety of compatible software will increase. Consumer’s utility does not increase directly because of other consumers purchasing given hardware, but because the accumulating userbase improves the quality of the product.

3. A network externality derived from complementary services or products. As soon as the given product becomes a standard, we observe a significant increase in the supply of complementary services. As a “standard” we understand any product or technology incorporating specifications that provide for compatibility [WBK06]. If the product gains a significant market share, the consumers have even more incentives to purchase it. Katz and Shapiro [KS85] provide an example of automobile market; foreign manufacturer’s sales were initially slowed down by consumers’ awareness of the lower availability of post-purchase service networks which existed for new or less popular brands.

Katz and Shapiro [KS85] provided the first utility based definition, but they were not primarily interested in the consumers’ decisions. They analyzed the enterprises’

side of the market instead. They focused on how many firms will coexist in the market affected by network externalities and what are the equilibria conditions for such markets. They analyzed a simple, static model of oligopoly in which firms adjust to the demand affected by network externalities.

If a market is affected by network externalities, consumers form expectations regarding the sizes of competing networks. The firms treat those expectations as given, which makes expectations the core factor determining the size of the market.

For some sets of expectations only one enterprise is enough, while for other sets of expectations, there is a group of enterprises in the market [KS85]. Firms do not only decide if they enter or stay in the market, but they also decide whether they provide a compatible product or not. If the network externalities are significant, this choice can be one of the crucial factors of later market performance. Conditional on the compatibility decision different sets of expectations are taken into account. In the case of incompatibility, expectations related to the network of individual enterprise’s

4The problem itself was not new; studies on communication services have already incorporated similar ideas, see, e.g., [Roh74]. It is unclear why Katz and Shapiro’s paper became so iconic – probably because of naming the phenomenon (and a surprisingly limited list of references).

(25)

consumers are relevant ones. For a compatible product, we should analyze a setup of aggregate market’s network. Consumption externalities shape the environment for the demand-side economies of scale by making consumers form expectations.

In the model proposed by Katz and Shapiro [KS85], there are no income effects, and consumers act to maximize their surplus. Consumers purchase either one or more units of good at most one brand. We consider a market for durabilities. Their values depend on the consumers’ expectations regarding the sizes of the networks of the buyers. All consumers have the same expectations of network sizes – their valuation of the network externality is identical. However, they have heterogeneous willingness to pay for the good. Networks of consumers are homogeneous; if two networks are of equal size, they are treated as the perfect substitutes. Each consumer buys the brand of the product which maximizes her surplus. If the surpluses calculated upon the consumer’s expectations regarding the sizes of the networks are negative, the consumer will quit the market.

Firms take the consumers’ expectations as given. The price a firm receives depends on the expected size of the consumers’ network and the total unit sales of other enterprises [KS85]. The second assumption comes from the standard Cournot model. There are two types of costs that every enterprise has to bear: the costs of the production and the cost of achieving compatibility (assumed to be fixed costs, they do not have to be homogeneous).

In Katz and Shapiro’s model [KS85] there are three different equilibria available:

1. A complete compatibility – we have only one network because consumers perceive the products as substitutes; this is a case of perfectly competitive equilibrium. This equilibrium is symmetric; every firm produces the same amount of good;

2. A complete incompatibility – a case in which any two brands are incompatible with one another. The solution is given in the form of equilibrium reaction correspondence. The result is similar to the one obtained in standard oligopoly model; however, the equilibrium reaction correspondence is not to be confused with a standard reaction function (there are different reaction func- tions for every set of expectations). The possible scenarios include symmetric oligopolies, natural oligopolies with groups of inactive firms and asymmetric oligopolies, in which an enterprise is successful (enjoys a large market share), just because the consumers expect so;

3. A partial compatibility – with similar equilibria as in oligopoly.

Katz and Shapiro [KS85] state that the complete compatibility leads to the highest level of total output. However, the highest level of total output does not mean that the compatibility always increases the profit of each firm. Two technologies allow for making a product compatible: a joint adoption of the product standard and construction of the adapter [KS85]. In the case of negligible costs of adapting and lack of additional barriers to entry, the market is perfectly competitive.

Farrel and Saloner [FS85] presented similar ideas and discussion of network externalities about the same time as Katz and Shapiro. They investigated the problem of coordinating innovation or a standard in the markets with incompatible products being at a disadvantage. They utilized a game theory approach, analyzing only

(26)

firms’ decisions (whether they adopt or not adopt a new standard). Their results showed that if in the market there is inefficient innovation (also called inertia), the communication among enterprises is not sufficient to initiate the switch to a new product. Initially, their work was less influential than [KS85]. However, it provided an alternative framework to model network externalities, leading eventually to studies utilizing Agent-based Computational Economics.

Church and Gandal [CG92] introduced the term “network effect” for the first time. They examined the software provision decisions made by firms. They developed the model of a market where there are two hardware technologies, a number of firms competing in the software market, and consumers who value software variety.

Their results showed that the profit made by a software provisor depends on two effects. The first emerges when the number of compatible software products for a given hardware increases. The technology becomes the more valuable, the more consumers decide to purchase those products; it is the network effect.⁵ The second effect is the competitive effect, resulting from the number of software firms joining the market conditionally that the network remains the same. The network effect increases with the value placed by consumers on variety, the number of software products, and the disposable income. If a hardware platform offers little variety of software, further reductions in the number of available applications will lower the network effect.

Liebowitz and Margolis [LM94; LM95a] argued whether all network effects are network externalities and whether the presence of network effects always induce market failure. They generalized the definition of network effects, showing that there are networks that can be owned (“sponsored” in the terminology of Katz and Shapiro; e.g., the telephone network – one cannot attach without the permission) and the “metaphorical networks” (e.g., a network of speakers of English). As a result, individuals are no longer confined to purchase decisions. They may choose whether or not to perform an action, whose value is affected by the number of other individuals performing it. There can be both positive and negative network effects:

the costs of participation may decline or rise depending on the number of other participants. Network externality is a special kind of network effect in which the equilibrium exhibits unexploited gains from trade on joining the network – authors argued that such distinction preserves the original understanding of the externality as an instance of market failure. They concluded that network effects are pervasive in the economy, but the evidence on the network externalities is limited as they are often confused with technological progress. Many of the conventional externalities could be internalized by a configuration of ownership or transactions among agents.

After Liebowitz and Margolis’ adjustments, the setup for modeling of network effects was sufficient for most of the practical purposes. Although authors used to notice the limitations, they either neglected them or applied the improvements to the empirical models without explicit changes in the underlying theory. However, the theoretical propositions by Weitzel et al. [WWW00] or Sundararajan [Sun08]

reshaped the initial setup significantly. They conquer the initial setup, that is why we cannot view them as the late follow-up.

5Note that in this understanding, network effect captured all of the externalities described by Katz and Shapiro [KS85].

(27)

1.2 Common areas of research on network effects

Usually defining a term is only a first step in research, that is why this section presents the typical problems covered in the literature on network effects. The authors utilized the traditional understanding of the term; they introduced possible changes into empirical models without theoretical discussions.

Standardization and technology adoption. Standardization was among the very first topics investigated by authors and remained one of the most prevalent ones, examples include [KS85], [FS85], [CG92] or [WBK06]. Katz and Shapiro [KS86]

analyzed this issue from the enterprise point of view. They investigated technology adoption by firms, especially in the case of sponsored technologies. They showed that if the sponsors are absent, the present superior technology has a strategic advantage, leading to its domination over the market. Inferior technologies may be adopted if they are sponsored. Additionally, if competing technologies are sponsored, the expected future superior technology will gain a strategic advantage.

Lock-in effect and switching costs. Emergence of a standard among technologies competing for adoption may correlate with so-called lock-in effect [Art89].

Assuming increasing returns to adoption, even if in the initial state of the market, consumers are indifferent between any alternatives, sooner or later the system will lock in one or two competing alternatives. The consumer will be bound to choose the same technology as the standard on arrival, no matter what her preferences are.

What technology will constitute a dominant one depends on the history and the previous choices of the agents. “Natural inclinations” [Art89] do not play a dominant role in decision making; information about the standards one should benefit from conforming to may modify them. Moreover, those standards may not be the superior ones. The assumption of the increasing returns is crucial here, the model with diminishing return does not have any absorbing states (lock-in does not occur). Even if lock-ins are usually robust against changes, break-outs remain possible [LB98; Ley01].

Liebowitz [Lie02] distinguishes two forms of lock-ins: a strong and a weak lock- in. A strong lock-in occurs when a consumer does not choose a better product, even if its superiority guarantees compensation for any issues related to self-perceived costs of a change (e.g., costs of learning or becoming familiar with the new product). Consumers do not switch because they are afraid of losing compatibility with others. In weak lock-ins, consumers find it inefficient to switch to a superior product because of the self-compatibility costs. Liebowitz [Lie02] argues that weak lock-ins are common and they are not related to network effects or economies of scale. Strong lock-ins cause inefficiencies.

Switching costs are real or perceived costs that are incurred when changing the network one belongs to, but which are not incurred by remaining within the same network [PWM03]. They are different from network effects, even if they have similar consequences for market competition and consumer lock-in [CS11]. Klemperer [Kle95] suggests that general reluctance to switch may result from uncertainty or incompatibility. Incompatibility and strong network effects allow the largest firms to maintain market share [Eco96]. Liebowitz [Lie02] identifies switching costs as the self-compatibility costs and shows the relation between the switching costs and

Open Source software and the network effects

University of Warsaw

Faculty of Economic Sciences

Dorota Celi´ nska-Kopczy´ nska

Open Source software and the network effects

Abstract

Streszczenie

Coauthor statement

Contents

Introduction

Acknowledgments

Chapter 1

Network effects in the context of social networks

1.1 The historical perspective on network effects

1.1.1 External effects on utility

1.1.2 Traditional view on network effect

1.2 Common areas of research on network effects