Podsumowanie serii testów - Index of /rozprawy2/10722

Po przeprowadzeniu serii testów, w oparciu o ich rezultaty mo»na stwierdzi¢, »e sys-tem charakteryzuje si¦ wysok¡ skuteczno±ci¡ dziaªania. Warto±ci dokªadno±ci w pierwszym scenariuszu na poziomie 0,91, precyzji na poziomie 0,89 i miary-F na poziomie 0,90 po-twierdzaj¡ jego wysok¡ zgodno±¢ z ocenami grupy ludzkich arbitrów, jaka jest stosowana do oceny tego typu danych i przypadków.

Dodatkowo, sposób prezentacji informacji w systemie pozwala ludzkim arbitrom za-uwa»y¢ dodatkowe elementy, których przy standardowej, bezpo±redniej eksploracji ¹ródªa czªowiek nie byªby w stanie zauwa»y¢. S¡ one cz¦sto elementami kluczowymi, wskazuj¡cymi na u»ycie multito»samo±ci przez jedn¡ zyczn¡ osob¡.

Drugi scenariusz, oraz 8 miejsce jednej pary, spo±ród prawie 370 tysi¦cy par, oraz wy-niki podobie«stw wy»sze ni» 98,81% dla wszystkich badanych par, równie» potwierdzaj¡ wysok¡ skuteczno±¢ systemu. Dodatkowo scenariusz ten opieraª si¦ o realne wirtualne mul-tito»samo±ci, wykryte przez administratorów serwisów na podstawie dost¦pnych jedynie dla nich danych (adresów IP i email).

W oparciu o uzyskane rezultaty mo»na stwierdzi¢, »e prezentowany system jest skutecz-nym narz¦dziem wspomagaj¡cym eksploracje sieci spoªecznych o charakterze tekstowym, w celu wykrycia ukrywaj¡cych si¦ multito»samo±ci. Fizyczna analiza ilo±ci danych, jakie zwykle zawieraj¡ takie ¹ródªa, jest czynno±ci¡ niewykonaln¡ zarówno dla pojedynczego czªowieka, jak i dla grupy osób.

Dodatkowo, poprzez odpowiednie zamodelowanie charakterystyk, system pozwala za-obserwowa¢ podobie«stwa pomi¦dzy niektórymi cechami, które nie s¡ widoczne przy bez-po±redniej, standardowej eksploracji danych przez czªowieka.

9 Wnioski ko«cowe

Powstanie ±wiata wirtualnego w dynamicznie rozwijaj¡cej si¦ sieci Internet ª¡czy si¦ z wyst¦powaniem szeregu procesów i zjawisk, które przenosz¡ si¦, nieraz w sposób zu-peªnie nieoczekiwany, do ±wiata rzeczywistego, wywoªuj¡c skutki niejednokrotnie bardzo pozytywne, lecz czasem wywoªuj¡ce pewne obawy, a nawet stwarzaj¡ce zagro»enia.

Do pierwszej grupy mo»na zaliczy¢ powstanie sieci spoªeczno±ciowych, opisanych sze-rzej w rozdziale 2.1, które umo»liwiaj¡ nawi¡zywanie kontaktów, wymian¦ pogl¡dów i opinii, podejmowanie wspólnych dziaªa«. Do drugiej za±, pojawienie si¦ wirtualnych multi-to»samo±ci pozwalaj¡cych na ukrywanie swej osobowo±ci, a w konsekwencji zwolnienie od odpowiedzialno±ci za podejmowane dziaªania.

Zamierzeniem niniejszej rozprawy byªo stworzenie metod i narz¦dzi umo»liwiaj¡cych identykacj¦ pewnej grupy wirtualnych multi-to»samo±ci, a co za tym idzie, ograniczenie niepo»¡danych efektów i zagro»e«, zwi¡zanych z ich wyst¦powaniem.

Prowadzone badania obejmowaªy opracowanie koncepcji modelu, a nast¦pnie architek-tury systemu oraz rozwi¡za« algorytmicznych, skªadaj¡cych si¦ na nowatorskie rozwi¡za-nie, pozwalaj¡ce na wyszukiwanie multito»samo±ci ukrywaj¡cych si¦ w sieciach spoªecz-nych. W celu zwerykowania efektywno±ci proponowanych rozwi¡za« zaprojektowano i zaimplementowano system, pozwalaj¡cy przetestowa¢ prezentowan¡ na wst¦pie hipotez¦.

System zawiera komponenty, które pozwoliªy na pobranie z otwartych ¹ródeª sieci In-ternet informacji o wirtualnych to»samo±ciach, dziaªaj¡cych w ramach sieci spoªeczno±cio-wych.

W toku prowadzonych prac zaimplementowano i przetestowano: • architektur¦ systemu pod k¡tem skuteczno±ci i wydajno±ci dziaªania,

• komponent crawlu i przechowywania danych pod k¡tem pobierania danych z dost¦p-nych ¹ródeª i pó¹niejszego ich udost¦pniania,

• algorytmy generowania charakterystyk, stanowi¡cych podstaw¦ wyszukiwania podo-bie«stwa wirtualnych to»samo±ci,

• algorytmy wyznaczania podobie«stw cech, charakteryzuj¡cych poszczególne to»sa-mo±ci wirtualne,

• algorytmy wyznaczania podobie«stw to»samo±ci wirtualnych, pod k¡tem ich skutecz-no±ci przy wyszukiwaniu multito»samo±ci ukrywaj¡cych si¦ w sieciach spoªecznych. Najwa»niejszymi oryginalnymi wªa±ciwo±ciami prezentowanego rozwi¡zania s¡ : • okre±lenie nowatorskich zestawów cech to»samo±ci wirtualnych, umo»liwiaj¡cych

• mo»liwo±¢ dziaªania systemu bez konieczno±ci wykorzystania danych osobowych, • zdolno±¢ utrzymania wysokiej efektywno±ci dziaªania wraz ze wzrostem liczby

wirtu-alnych to»samo±ci.

Wyniki testów dziaªania systemu, opisane szczegóªowo w rozdziale 8, potwierdziªy prak-tyczn¡ skuteczno±¢ rozwi¡zania. System jest wi¦c efektywnym narz¦dziem, mog¡cym sªu-»y¢ w rozwi¡zywaniu problemów dotycz¡cych identykacji wirtualnych to»samo±ci, w tym mi¦dzy innymi wykrywania pªatnych opinii w sieciach spoªeczno±ciowych.

Oszacowania parametrów "dokªadno±ci" i "miary-F" osi¡gaªy w testach warto±ci zbli-»one do 90%, co wskazuje na wysok¡ skuteczno±¢ diagnoz zwracanych przez system.

Dodatkowo, ilo±¢ informacji jakie mo»e przetworzy¢ i przeanalizowa¢ system znacznie przekracza mo»liwo±ci pojedynczego czªowieka, a nawet grupy osób. Liczba wirtualnych to»samo±ci, które mog¡ by¢ analizowane przez system, przewy»sza znacznie aktualnie spo-tykane rozwi¡zania z dziedziny analizy autorstwa tekstów. Tak»e efektywno±¢ systemu przy du»ej liczbie autorów jest lepsza od tej, jak¡ zapewniaj¡ znane w tej dziedzinie rozwi¡zania (opisane szczegóªowo w rozdziale 3.4.3).

Zaprezentowane rozwi¡zanie mo»e zosta¢ wykorzystane w szerokiej gamie bada« w dziedzinie wykrywania cyberprzest¦pczo±ci i analizy sieci spoªecznych. Wykrywanie han-dlu nielegalnymi towarami i substancjami, pod»eganie do nienawi±ci rasowej, wyªudzenia, dzieci¦ca pornograa oraz wykrywanie grup przest¦pczych lub terrorystycznych, to tylko wybrane przykªady zada« z obszaru bezpiecze«stwa publicznego, do rozwi¡zania których zastosowany mo»e zosta¢ system, po odpowiedniej adaptacji poszczególnych komponentów. Równocze±nie, system mo»e zosta¢ wykorzystany jako platforma do prowadzenia testów nowych algorytmów generowania charakterystyk, wyznaczania ich indywidualnych cech oraz wyszukiwania podobie«stw to»samo±ci wirtualnych.

Warto przy tym wspomnie¢, »e istotn¡ trudno±¢, jak¡ napotkano przy realizacji prowa-dzonych bada«, stanowiªo pozyskanie rzeczywistych danych umo»liwiaj¡cych jednoznaczne potwierdzenie wyniku testu. Potwierdzenie takie uzyskuje si¦ dopiero w oparciu o dane osobowe internautów, które w ogólnym przypadku s¡ niedost¦pne z mocy prawa. W kon-kretnych sytuacjach dane te mog¡ by¢ udost¦pniane przy uzasadnionych podejrzeniach o popeªnieniu przest¦pstwa.

W perspektywie przewiduje si¦ dalsze doskonalenie realizowanego systemu, zarówno przez wprowadzenie nowych charakterystyk wirtualnych to»samo±ci, oraz modykacji al-gorytmów wykrywania podobie«stw, jak te» rozbudow¦ interfejsu, m.in. przez stworzenie gracznej prezentacji uzyskanych rezultatów, uªatwiaj¡cej u»ytkownikowi ich interpreta-cj¦.

Literatura

[1] Abbasi, A., Chen, H.;

Applying authorship analysis to extremist-group web forum messages. Intelligent Sys-tems, IEEE, 20(5), 67-75. 2005

[2] Abbasi, A., Chen, H.;

Visualizing authorship for identication. Intelligence and Security Informatics, 60-71. 2006

[3] Abbasi, A., Chen, H.;

Writeprints: A stylometric approach to identity-level identication and similarity de-tection in cyberspace. ACM Transactions on Information Systems, 26(2), 7. 2008 [4] Argamon, S., Juola, P.;

Overview of the international authorship identication competition at PAN-2011 In CLEF 2011: Proceedings of the 2011 Conference on Multilingual and Multimodal In-formation Access Evaluation (Lab and Workshop Notebook Papers), Amsterdam, The Netherland, 2011

[5] Baayen, H.;

Statistical models for word frequency distributions: A linguistic evaluation. Computers and the Humanities, 26(5), 347-363. 1992

[6] Baayen, H., Van Halteren, H., Tweedie, F.;

Outside the cave of shadows: Using syntactic annotation to enhance authorship attri-bution. Literary and Linguistic Computing, 11(3), 121-132, 1996

[7] Bargh, J. A., McKenna, K. Y. A. and Fitzsimons, G. M.;

Can You See the Real Me? Activation and Expression of the True Self¶ on the Internet. Journal of Social Issues, 58: 3348. 2002

[8] Barnes, J., A.;

Class and Committees in a Norwegian Island Parish Human Relations February 1954 7: 39-58, 1954

[9] Baturo, W.;

Technika. Spojrzenie na dzieje cywilizacji Warszawa: PWN, 2003, ISBN: 83-01-13988-9 [10] Bechar-Israeli, H.;

From< Bonehead> to< cLoNehEAd>: nicknames, play and identity on Internet relay chat. Journal of Computer-Mediated Communication, 1(2), n2., 1995

[11] Behdad, M., Barone, L., Bennamoun, M., French, T.;

Nature-Inspired Techniques in the Context of Fraud Detection Systems, Man, and Cy-bernetics, Part C: Applications and Reviews, IEEE Transactions on, 42(6), 1273-1290. [12] Benedikt, M.;

Cyberspace: some proposals. In Cyberspace, Michael Benedikt (Ed.). MIT Press, Cam-bridge, MA, USA 119-224. ISBN:0-262-02327-X, 1991

[13] Bhattacharya, I., Getoor, L.;

A latent dirichlet model for unsupervised entity resolution. In 6th SIAM Conference on Data Mining (SDM), Bethesda, USA, 2005

[14] Bhattacharya, I., Getoor, L.;

Entity resolution in graphs. Mining graph data, 311. 2006 [15] Binongo, J. N. G., Smith, M. W. A.;

A bridge between statistics and literature: The graphs of Oscar Wilde's literary genres. Journal of Applied Statistics, 26(7), 781-787. 1999

[16] Bird, C., Gourley, A., Devanbu, P., Gertz, M., Swaminathan, A.;

Mining email social networks. In Proceedings of the 2006 international workshop on Mining software repositories (pp. 137-143). ACM. 2006

[17] Bone, J.;

Vogue model Liskula Cohen wins right to unmask oensive blogger. The Times, Aug. 2009

[18] Brainerd, B.;

Statistical analysis of Lexical data using Chi-squared and related distributions. Compu-ters and the Humanities, 9(4), 161-178. 1975

[19] Brenner, S. W.;

At light speed: Attribution and response to cybercrime/terrorism/warfare. J. Crim. L. & Criminology, 97, 379.

[20] Brignall III, T. W., Van Valey, T. L.;

An online community as a new tribalism: The world of warcraft. In System Sciences, HICSS 2007. 40th Annual Hawaii International Conference on (pp. 179b-179b). IEEE. 2007

[21] Burrows, J. F.;

An ocean where each kind...ñ: Statistical analysis and some major determinants of literary style. Computers and the Humanities, 23(4), 309-321. 1989

[22] Burrows, J.;

Deltañ: A measure of stylistic dierence and a guide to likely authorship. Literary and Linguistic Computing, 17(3), 267-287. 2002

[23] Castronova, E.;

Theory of the Avatar CESifo Working Paper Series No.863 2003 [24] Chang, W., Chung, W., Chen, H., Chou, S.;

An international perspective on ghting cybercrime. Intelligence and Security Informa-tics, 958-958. 2003

[25] Charniak, E.;

Statistical language learning. MIT press. 1996 [26] Chen, H., Lynch, K. J.;

Automatic construction of networks of concepts characterizing document databases. Sys-tems, Man and Cybernetics, IEEE Transactions on, 22(5), 885-902. 1992

[27] Chen, H., Shankaranarayanan, G., She, L., Iyer, A.;

A machine learning approach to inductive query by examples: an experiment using relevance feedback, ID3, genetic algorithms, and simulated annealing. Journal of the American Society for Information Science, Volume 49, Number 8, Pages 693705, 1998 [28] Chen, H., Martinez, J., Kirchho, A., Ng, T. D., Schatz, B. R.;

Alleviating search uncertainty through concept associations: Automatic indexing, co-occurrence analysis, and parallel computing. Journal of the American Society for Infor-mation Science, 49(3), 206-216. 1998

[29] Chen, H. C., Goldberg, M., Magdon-Ismail, M.;

Identifying multi-ID users in open forums. Intelligence and Security Informatics, 176-186. 2004

[30] Chen, H., Chau, M.;

Web mining: Machine learning for Web applications. Annual review of information science and technology, 38, 289-330. 2004

[31] Chen, H.;

Exploring extremism and terrorism on the web: the dark web project. Intelligence and Security Informatics, 1-20. 2007

[32] Chen, C., Wu, K., Srinivasan, V., Zhang, X.;

Battling the internet water army: Detection of hidden paid posters. arXiv preprint arXiv:1111.4297 2011

[33] Cover, T., Hart, P.;

Nearest neighbor pattern classication. Information Theory, IEEE Transactions on, 13(1), 21-27. 1967

[34] Christopherson, K. M.;

The positive and negative implications of anonymity in Internet social interactions:On the Internet, nobody knows youñre a dog¶. Computers in Human Behavior, 23(6), 3038-3056. 2007

[35] Cristianini, N., Shawe-Taylor, J.;

An introduction to support vector machines and other kernel-based learning methods. Cambridge university press. 2000

[36] Culotta, A., McCallum, A.; Joint deduplication of multiple record types in relational data. In Proceedings of the 14th ACM international conference on Information and knowledge management (pp. 257-258). ACM 2005

[37] De Vel, O.;

Mining e-mail authorship. In Proc. Workshop on Text Mining, ACM International Conference on Knowledge Discovery and Data Mining, 2000.

[38] De Vel, O., Anderson, A., Corney, M., Mohay, G.;

Mining e-mail content for author identication forensics. ACM Sigmod Record, 30(4), 55-64. 2001

[39] Diederich, J., Kindermann, J., Leopold, E., Paass, G.;

Authorship attribution with support vector machines. Applied intelligence, 19(1), 109-123. 2003

[40] Dingledine, R., Mathewson, N., Syverson, P.;

Tor: The second-generation onion router. NAVAL RESEARCH LAB WASHINGTON DC, 2004

[41] Eisenbeiss, M., Blechsmidt, B., Backhaus, K., Freund, P. A.;

The (real) world is not enough: The motivational drivers and user behavior in virtual worlds, Journal of Interactive Marketing, vol. 26, no. 1, pp. 4-20, 2012.

[42] Elliot, W., Valenza, R.;

Was the Earl of Oxford the true Shakespeare. Notes and Queries, 38(4), 501-506. 1991 [43] Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., Lin, C. J.;

LIBLINEAR: A library for large linear classication. The Journal of Machine Learning Research, 9, 1871-1874. 2008

[44] Farringdon, J. M., Morton, A. Q., Farringdon, M. G., Baker, M. D.;

Analysing for Authorship: A Guide to the Cusum Technique. University of Wales Press. 1996

[45] Flake, G. W., Lawrence, S., Giles, C. L.;

Ecient identication of web communities. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 150-160). ACM. 2000

[46] Forsyth, R. S., Holmes, D. I.;

Feature-nding for test classication. Literary and Linguistic Computing, 11(4), 163-174. 1996

[47] Golbeck, J., Hendler, J.;

Filmtrust: Movie recommendations using trust in web-based social networks. In Proce-edings of the IEEE Consumer communications and networking conference (Vol. 96). University of Maryland. 2006

[48] Harris, C. G.;

Detecting Deceptive Opinion Spam Using Human Computation. In Workshops at the Twenty-Sixth AAAI Conference on Articial Intelligence. 2012

[49] Hauck, R. V., Sewell, R. R., Ng, T. D., Chen, H.;

Concept-based searching and browsing: a geoscience experiment. Journal of information science, 27(4), 199-210. 2001

[50] Hayati, P., Potdar, V.;

Toward spam 2.0: an evaluation of web 2.0 anti-spam methods. In Industrial Informa-tics, 2009. INDIN 2009. 7th IEEE International Conference on (pp. 875-880). IEEE. [51] Holmes, D. I., Forsyth, R. S.;

The Federalist revisited: New directions in authorship attribution. Literary and Lingu-istic Computing, 10(2), 111-127. 1995

[52] Holmes, D. I.;

The evolution of stylometry in humanities scholarship. Literary and linguistic compu-ting, 13(3), 111-117. 1998

[53] Homem, N., Carvalho, J. P.;

Authorship identication and author fuzzy ngerprints¶. In Fuzzy Information Proces-sing Society (NAFIPS), 2011 Annual Meeting of the North American (pp. 1-6). IEEE. 2011

[54] Howard, B.

Analyzing online social networks. Communications of the ACM, 51(11), 14-16. [55] Houvardas, J., Stamatatos, E.;

N-gram feature selection for authorship identication. Articial Intelligence: Methodo-logy, Systems, and Applications, 77-86. 2006

[56] Hsu, C. W., Lin, C. J.;

A comparison of methods for multiclass support vector machines. Neural Networks, IEEE Transactions on, 13(2), 415-425. 2002

[57] Hsu, C. W., Chang, C. C., Lin, C. J.;

A practical guide to support vector classication. [58] Hu, Q., Yu, D., Liu, J., Wu, C.;

Neighborhood rough set based heterogeneous feature subset selection. Information scien-ces, 178(18), 3577-3594, 2008

[59] Hu, N., Bose, I., Koh, N. S., Liu, L.;

Manipulation of online reviews: An analysis of ratings, readability, and sentiments. Decision Support Systems. 2011

[60] International Telecommunication Union;

Measuring the Information Society 2012 Place des Nations, CH-1211 Geneva Switzer-land, ISBN 978-92-61-14071-7

[61] Iqbal, F., Binsalleeh, H., Fung, B., Debbabi, M.;

A unied data mining solution for authorship analysis in anonymous textual commu-nications. Information Sciences. 2011

[62] Jennings, B., Finkelstein, A.

Digital identity and reputation in the context of a bounded social ecosystem. In Business Process Management Workshops (pp. 687-697). Springer Berlin Heidelberg. 2009. [63] Jin, X., Lin, C. X., Luo, J., Han, J.;

A Data Mining-based Spam Detection System for Social Media Networks. Proceedings of the VLDB Endowment, 4(12).

[64] Jindal, N., Bing Liu;

Analyzing and Detecting Review Spam Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on , vol., no., pp.547-552, 28-31 Oct. 2007

[65] Jindal, N., Liu, B.;

Opinion spam and analysis. In Proceedings of the international conference on Web search and web data mining (pp. 219-230). 2008

[66] Jones, S.;

Cybersociety 2.0: Revisiting computer-mediated community and technology (Vol. 2). Sage Publications, Incorporated. ISBN:0761914625, 1998

[67] Jonas, J.;

Identity resolution: 23 years of practical experience and observations at scale. In Pro-ceedings of the 2006 ACM SIGMOD international conference on Management of data (pp. 718-718). ACM. 2006

[68] Juola, P.;

Ad-hoc authorship attribution competition. In Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing (pp. 175-176). 2004

[69] Juola, P.;

Authorship attribution. Foundations and Trends in Information Retrieval, 1(3), 233-334. 2007

[70] Kaplan, A.,M., Haenlein, M.;

The fairyland of second life: Virtual social worlds and how to use them, Journal of Business Horizons, vol. 52, no. 6, pp. 563-572, 2009.

[71] Kazienko, P.;

Expansion of telecommunication social networks. Cooperative Design, Visualization, and Engineering, 404-412. 2007

[72] Kennedy, H.;

Beyond anonymity, or future directions for internet identity research. New Media & Society, 8(6), 859-876. 2006

[73] Keselj, V., Peng, F., Cercone, N., Thomas, C.;

N-gram-based author proles for authorship attribution. In Proceedings of the Conferen-ce Pacic Association for Computational Linguistics, PACLING (Vol. 3, pp. 255-264). 2003

[74] Khmelev, D. V., Tweedie, F. J.;

Using Markov Chains for Identication of Writer. Literary and linguistic computing, 16(3), 299-307. 2001

[75] Kim, S. M., Pantel, P., Chklovski, T., Pennacchiotti, M.;

Automatically assessing review helpfulness. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (pp. 423-430). 2006

[76] Kjell, B., Woods, W., Frieder, O.;

Discrimination of authorship using visualization. Information processing & manage-ment, 30(1), 141-150. 1994

[77] Kohavi, R.;

A study of cross-validation and bootstrap for accuracy estimation and model selection. In International joint Conference on articial intelligence (Vol. 14, pp. 1137-1145). Lawrence Erlbaum Associates Ltd. 1995

[78] van Kokswijk. J.;

Granting Personality to a Virtual Identity International Journal of Human and Social Sciences, 2(4),

[79] Koles, B.; Nagy, P.;

Virtual Customers Behind Avatars: The Relationship between Virtual Identity and Vir-tual Consumption in Second Life Academic journal article from Journal of Theoretical and Applied Electronic Commerce Research, Vol. 7, No. 2 August 2012

[80] Kollock, P., Smith, M. A.;

Communities in cyberspace. Routledge New York, NY, 10001, ISBN:0415191394, 1998 [81] Koppel, M., Schler, J., Argamon, S., Messeri, E.;

Authorship attribution with thousands of candidate authors. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in informa-tion retrieval (pp. 659-660). ACM. 2006

[82] Koppel, M., Schler, J., Bonchek-Dokow, E.;

Measuring Dierentiability: Unmasking Pseudonymous Authors. Journal of Machine Learning Research, 8, 1261-1276. 2007

[83] Koppel, M., Schler, J. and Argamon, S.;

Computational methods in authorship attribution. J. Am. Soc. Inf. Sci., 60: 926. doi: 10.1002/asi.20961 2009

[84] Koppel, M., Schler, J., Argamon, S.;

Authorship attribution in the wild. Language Resources and Evaluation, 45(1), 83-94. 2011

[85] Kucukyilmaz, T., Cambazoglu, B. B., Aykanat, C., Can, F.;

Chat mining: Predicting user and message attributes in computer-mediated communi-cation. Information Processing & Management, 44(4), 1448-1466. 2008

[86] KZero Report

Virtual worlds registered accounts Q2 2011 KZero Worldwide. [Online]. Available: http://www.kzero.co.uk/

[87] Ledger, G., Merriam, T.;

Shakespeare, Fletcher, and the two noble kinsmen. Literary and Linguistic Computing, 9(3), 235-248. 1994

[88] Le, J., Edmonds, A., Hester, V., Biewald, L.;

Ensuring quality in crowdsourced search relevance evaluation: The eects of training question distribution. In SIGIR 2010 workshop on crowdsourcing for search evaluation (pp. 21-26). 2010

[89] Lewis, D.;

Naive (Bayes) at forty: The independence assumption in information retrieval. Machine Learning: ECML-98, 4-15. 1998

[90] Li, J., Zheng, R., Chen, H.;

From ngerprint to writeprint. Communications of the ACM, 49(4), 76-82. 2006 [91] Li, J., Wang, G. A., Chen, H.;

PRM-based identity matching using social context. In Intelligence and Security Infor-matics, 2008. ISI 2008. IEEE International Conference on (pp. 150-155). IEEE. 2008 [92] Li, J., Wang, G. A., Chen, H.;

Identity matching using personal and social identity features. Information Systems Fron-tiers, 13(1), 101-113. 2010

[93] Lim, E. P., Nguyen, V. A., Jindal, N., Liu, B., Lauw, H. W.;

Detecting product review spammers using rating behaviors. In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 939-948). ACM. 2010

[94] Lippmann, R.;

An introduction to computing with neural nets. ASSP Magazine, IEEE, 4(2), 4-22. 1987 [95] Liu, B.;

[96] Lowe, D., Matthews, R.;

Shakespeare vs. Fletcher: A stylometric analysis by radial basis functions. Computers and the Humanities, 29(6), 449-461. 1995

[97] Luyckx, K., Daelemans, W.;

Authorship attribution and verication with many authors and limited data. In Proce-edings of the 22nd International Conference on Computational Linguistics-Volume 1 (pp. 513-520). Association for Computational Linguistics. 2008

[98] Luyckx, K.;

Scalability issues in authorship attribution. ASP-Academic & Scientic Publishers. 2011 [99] Maciolek, P., Dobrowolski, G.

CLUO: Web-Scale Text Mining System for Open Source Intelligence Purposes. Com-puter Science, 14(1), 45. doi:10.7494/csci.2013.14.1.45, 2013

[100] Marshall, B., Kaza, S., Xu, J., Atabakhsh, H., Petersen, T., Violette, C., Chen, H.; Cross-jurisdictional criminal activity networks to support border and transportation se-curity. In Intelligent Transportation Systems, 2004. Proceedings. The 7th International IEEE Conference on (pp. 100-105). IEEE. 2004

[101] McCallum, A., Nigam, K.;

A comparison of event models for naive bayes text classication. In AAAI-98 workshop on learning for text categorization (Vol. 752, pp. 41-48). 1998

[102] Messinger, P. R., Stroulia, E., Lyons, K., Bone, M., Niu, R. H., Smirnov, K., Perelgut, S.;

Virtual worldspast, present, and future: New directions in social computing. Decision Support Systems, 47(3), 204-228. 2009

[103] Miniwatts Marketing Group;

World internet usage and population statistics, June 30, 2012 www.internetworldstats.com, Miniwatts Marketing Group

[104] Mosteller, F., Wallace, D.;

Inference and disputed authorship The Federalist, 1964 [105] Mosteller, F., Wallace, D.;

Applied Bayesian and classical inference the case of the Federalist papers. 2nd Edition of Inference and Disputed Authorship: The Federalist. Springer-Verlag, New York, 1984 [106] Mukherjee, A., Liu, B., Glance, N.;

Spotting Fake reviewer groups in consumer reviews. In Proceedings of the 21st interna-tional conference on World Wide Web (pp. 191-200). ACM. 2012

[107] Mukherjee, A., Kumar, A., Liu, B., Wang, J., Hsu, M., Castellanos, M., Ghosh, R. Spotting opinion spammers using behavioral footprints. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 632-640). ACM. 2013

[108] Navarro, G.;

A guided tour to approximate string matching ACM computing surveys (CSUR), 33(1), 31-88.

[109] Macdonald, C., Ounis, I., Soboro, I.;

Is spam an issue for opinionated blog post search?. In Proceedings of the 32nd interna-tional ACM SIGIR conference on Research and development in information retrieval (SIGIR '09). ACM, New York, NY, USA, 710-711 2009

[110] Madigan, D., Genkin, A., Lewis, D. D., Argamon, S., Fradkin, D., Ye, L.;

Author identication on the large scale. In Proc. of the Meeting of the Classication Society of North America. 2005

[111] Milgram, S.;

The small world problem. Psychology today, 2(1), 60-67. 1967 [112] Moore, G.,E.;

Cramming more components onto integrated circuits The Future of Integrated Electro-nics, Volume 38, Number 8, April 19, 1965

[113] Musiaª, K., Kazienko, P.;

Social networks on the internet. World Wide Web, 1-42. 2012

[114] Narayanan, A., Paskov, H., Gong, N. Z., Bethencourt, J., Stefanov, E., Shin, E. C. R., Song, D.;

On the feasibility of internet-scale author identication. In Security and Privacy (SP), 2012 IEEE Symposium on (pp. 300-314). IEEE. 2012

[115] Netcraft;

December 2012 Web Server Survey Netcraft LTD,

http://news.netcraft.com/archives/2012/12/04/december-2012-web-server-survey.html

[116] Orebaugh, A., Allnutt, D. J.;

Data Mining Instant Messaging Communications to Perform Author Identication for Cybercrime Investigations. Digital Forensics and Cyber Crime, 99-110. 2010

[117] Ott, M., Choi, Y., Cardie, C., Hancock, J. T.;

Finding deceptive opinion spam by any stretch of the imagination. arXiv preprint

W dokumencie Index of /rozprawy2/10722 (Stron 97-116)