Top of page Top of page


Yugoslav corpus

What remains of Yugoslavia? From the geopolitical space of Yugoslavia
to the virtual space of the Web Yugosphere

By Francesco Mazzucchelli


Abstract

The paper works from the double hypothesis that: (1) a Yugoslav socio-cultural space still exists in spite of the dissolution of the former Socialist Federal Republic of Yugoslavia; (2) the communities “occupying” this space can be considered, in some measure, “diasporic”, if the “Yugoslav diaspora” is defined by not only the geographic displacement of people but also the loosening of the connections between the members of an ex-nation who still consider themselves a national community. The “space” mapped in the essay is the so-called “virtual space” of the Web, including all websites that reconnect to the “cultural languages” of the “past-country”.

Introduction and findings




Towards e-Diasporas 2



Resources

Download full working paper Download full working paper
Access the interactive graphs Access the interactive graphs
Researcher's biography

Francesco Mazzucchelli holds a PhD from the University of Bologna (Italy) and is a post-doctoral research fellow in the Department of Disciplines of Communication of the same university. In 2011, he was appointed Visiting Research Fellow at the School of Environment and Development of the University of Manchester (UK). His research focuses on the relation between memory and spatiality, but his interests range over semiotics, discourse analysis, urban studies, memory and identity studies, Balkan studies and ethnography. He is the author of several academic papers and of the book Urbicidio. Il sense die luoghi tra distruzioni e ricostruzioni in ex Jugoslavia (Bononia University Press, 2010).

Concepts, Tools & Methodology

Dana Diminescu, Mathieu Jacomy & Matthieu Renault


1) Shaping Concepts

e–DIASPORA: A migrant community as it organises itself and acts via various digital media, particularly on the web, and whose practices are those of a community whose interactions are ‘enhanced’ by digital exchange. An e–diaspora is a dispersed collectivity1. It is both ‘online’ and ‘offline’, so what interests us is both the digital ‘translations’ of ‘physical’ actors/phenomena (the online activities of associations for example) and the specifically (‘natively’) digital actors/phenomena (e. g., a forum and its internal interactions), what are sometimes called pure players. The question of ‘rub–offs’—reciprocal influence between these two sorts of web entity—is of capital importance in analysis of an e–diaspora. It's thus clear that research carried out in the context of the e–Diasporas Project presupposes a knowledge of the diaspora in question and, based on exploration of the web, calls on new research in the field. It also implies a knowledge of the web and an appreciation of the singularity of the exchanges that take place there.

DIASPORA WEB: An ensemble of the ‘migrant sites’ and ‘neighbouring sites’ (cf. infra) of a given diaspora, whether such sites be ‘living’ or ‘dead’ (cf. infra, ‘dead site’). In a sense, the web ecosystem of a diaspora.

e–DIASPORA CORPUS: The constitution of a corpus of websites is the method used to ‘capture’ an e–diaspora. A question of breakdown and selection that allows extraction of a diaspora web, it is also a task of definition in that a diaspora web presents itself to a researcher only as a product of this ‘excision’ performed upon the web. Similarly, it is only because of such exploration/selection, this filtering/circumscription of a corpus, that what a migrant site actually is takes on meaning.

MIGRANT SITE: A website created or managed by migrants and/or that deals with them (at any rate, a site for which migration is a defining theme). This could be a personal site or blog, the site of an association, a portal/forum, an institutional site, or anything similar. Usage is not the criterion: a site often consulted by migrants (a media site, for example) is not necessarily a migrant site. What distinguishes ‘activity’ is first and foremost the production of content and practice of citation (hyperlinks). On the other hand, a migrant site need not necessarily be situated in a foreign country and may just as easily be in the country of origin. Migrant sites testify to a given e–diaspora's occupation of the web.

NEIGHBOURING SITE: A non–migrant site (or one belonging to an e–diaspora other than the one being studied) which distinguishes itself by its strong connection with the (migrant) sites of a given e–diaspora (governmental or media sites of the country of origin, for example). However, not every site strongly linked to an e–diaspora is necessarily a neighbouring site. To be one it needs to be ‘specific’ to the diaspora in question which is why sites ‘on the fringes of’ the majority of web communities, particularly those in the upper layers of the web, Google, Youtube, Facebook and so on (cf. the diagram below), are not counted as ‘neighbours’. In the e–Diasporas Project, a list of neighbouring sites may be drawn up alongside that of migrant ones. These neighbouring sites discovered during the prospecting phase are not crawled during subsequent prospection but only during the phase of validation so as to gather together all links with the migrant site.

1 That is to say “a heterogeneous entity whose existence rest on an elaboration of a common direction, a direction not defined once and for all but which is constantly renegotiated throughout the evolution of the collective” (Turner). Furthermore, we prefer the term e–diaspora to that of ‘digital diaspora’ since the latter runs the risk of becoming a source of confusion given the increasingly frequent use of the notions of ‘digital native’ and ‘digital immigrant’ in a ‘generational’ sense (distinguishing those born before from those born during/after the digital era). The object of the e–Diasporas Atlas is not the ‘digital migrant’ but the migrant online.

2) Methodology and Tools



NAVICRAWLER: Navicrawler is an extension of the Firefox Web browser, this is a semi–automatic search tool which analyses the structure and content of pages and hyperlinks in order to assist the user during a browsing session. It helps the researcher in creating a corpus of websites related to his/her topic of study. This tool was developed chiefly by Mathieu Jacomy in the framework of the ICT-Migrations research program.
Download and documentation: http://webatlas.fr/wp/navicrawler/


GEPHI: Gephi is an interactive visualisation and exploration tool for all types of networks, complex systems, and dynamic and hierarchical graphs. In the e-Diasporas Atlas, it has been used to visualise and interpret the structuring and distribution of actors in migrant-community networks on the Web. The Gephi project was initiated by Mathieu Jacomy, in the context of the ICT-Migrations research program, and was then developed by Mathieu Bastian and Sébastien Heymann.
Download and documentation: http://gephi.org/


THE AUTOMATIC CRAWL (following the creation of a first Navicrawler corpus): A crawler is a ‘robot’ (computer program) which automatically browses the web based on a given list of URLs and follows all hyperlinks on the pages visited. A depth of exploration (number of successive ‘external’ hyperlinks—between one site and another—to be visited) is fixed as the crawl parameter and the results are stocked in the form of a graph whose nodes are pages or websites (sites in the case of e–Diasporas), and arcs the hyperlinks connecting them (note that, where e–Diasporas is concerned, site content is not indexed). As an example, a crawl to distance 1(visits to initial sites plus those linked to them) on a corpus of around fifty sites will come up with thousands and even hundreds of thousands of results. Most of these are anything but pertinent to a given corpus (we estimate that 1—10% actually are), so the results then need to be filtered. Two pointers are used by e–Diasporas: the number of initial sites referring to a given site discovered, and the number of initial sites to which it, itself, refers.

THE e–DIASPORAS PLATFORM: A site that gathers together all ‘data’ for a particular section of the e–Diasporas Atlas once the validation crawl of a corpus is complete. In a certain sense this is a database of graphs and statistics (automatically generated by the platform based on the classification fields of the given corpus) and can be mobilised by the researcher in the writing of his or her article: http://maps.e-diasporas.fr

ADDITIONAL TOOLS: Over and above the tools specially developed for the e–Diasporas Atlas, there are other applications that may be useful to researchers. This is particularly the case with whois which facilitates the gathering of information concerning the geographical location of sites and their ‘registrants’ (the information sometimes concerns only the localisation of the web host hosting the site and this is of little value). Cf. for example, http://www.coolwhois.com. For other tools developed by Digital Methods in Amsterdam which may prove useful.
Cf. http://wiki.digitalmethods.net/Dmi/ToolDatabase

3) Graph Interpretation


In the context of the e–Diasporas Atlas, it is the graph that primarily allows formulation of research hypotheses. Graphs serve not so much as demonstration/explication as an embodiment of the construction of an interpretation of data. They thus all have a heuristic function, their interpretation being an aspect of visual analytics.

MAPS — BETWEEN GRAPH AND TABLE: Graphs produced for the e–Diasporas project are quite particular. They're actually ‘minor graphs’, at least when compared to those handled by researchers in other fields. There's an essential qualitative aspect to our graphs insofar as the data are extremely precisely described/classified by the researchers. The graph's topological structure (the hyperlinks) independent of the classification of the sites in tabular form is of no interest to us. E–Diasporas maps are a product of the fusion between the structure of graphs and the structure of tables.

Since the networks themselves are small, one can easily ‘see’ the structures, which are never particular complicated: a small number of more or less dense ‘packets’ linked in certain ways, occasionally with points or centres and, and, possibly, a polarisation (cf. infra). The question is ‘Why is it like this?’ and the answer lies in categorisation. One intuitively finds that such–and–such a packet corresponds to such–and–such a particular sort of site, polarises itself according to these other criteria and so on (cf. infra). One examines how the sites classify themselves according to several norms, and needs only count them and draw up proportions and crossovers utilising classical statistics and ignoring links just as if one were dealing with the simple columns of a table describing the sites involved.

THE VARIOUS TYPES OF GRAPH/MAP: There are several kinds of map in the e–Diasporas platform. On its front page the researcher will first find a series of small maps giving an overall view of the various corpuses. These are known simply as thumbnail maps. Clicking on the section that interests them, they will then access a general, ‘unicolour’ map devoid of classifications. This is the so–called map foundation. Exploring this map, they will then gain access to a series of maps coloured according to various fields of classification. These are the so–called thematic maps.

GRAPH ORIENTATION—CENTRE AND PERIPHERY: The ‘north/south’ and ‘east/west’ orientations of nodes comprising a graph have no meaning here. Otherwise put, rotations such as may be applied to a graph have no effect on it: it remains the same graph. All that is to be considered is the relative distribution of its nodes, their position as regards each other (except for nodes whose positions are without connection and which are thus without significance to the graph and may just as well be represented by a list). The notions centre and periphery (be this on a graph as a whole or on some sub–graph) are of far greater importance in this regard in that they inform as to the degree of influence a node has, as also does that of distance. For all that ‘visible’ distance is a source of information, however, it has no ‘absolute’ value: the actual distance between nodes on a graph is the number of links one needs follow to get from one to another. If the arcs of a graph have ‘direction’—which is the case for a web graph—the distance from A to B is not necessarily the same as that from B to A.

IDENTIFYING MAP COMPONENTS—STUDYING CLUSTERING: The first reading of a graph—particularly of a map foundation—consists in the identification of its components. First one identifies clusters, groups of sites far more clearly linked to each other than to the rest of the graph. In the absence of a veritable cluster, particularly when a graph is dense and may, itself, constitute a relatively homogeneous ‘whole’, one may attempt to identify its denser seeming zones. This could also be expressed negatively by attempting to identify its ‘holes’, its ‘empty zones’. It also implies considering whether the graph has a principal component (often the graph's centre) and series of ‘subordinate’ ones, or whether it is simply a multiplicity of components that are actually ‘independent’. Based on this step, the researcher may then identify the reasons for these preferential attachments within the graph and come to understand ‘what makes it cluster’. If this is not clear, they should make use of visualisation (cf. infra) according to category and/or ‘quit’ the graph to look for explanations elsewhere on–site.

IDENTIFYING A COMMUNITY: How does one identify communities with a graph? Note that a cluster is not necessarily a sign of the presence of a communal structure in the strictest sense of that word. It is not enough, for there to be a community, that there be a few ‘large’ and influential sites (‘hubs’ or ‘authorities’, for which, cf. infra), there also (even only) has to be a multiplicity of ‘small’ sites, well connected among themselves. Simply put, the network should be more like a ‘spider–web’ than a ‘star’.



This (black circle) is a community.



Here there is no community, just a single site linking out to others
which are, themselves, barely or not at all connected to each other.

IDENTIFYING ‘HUBS’ AND ‘AUTHORITIES’: In a graph, a hub is a node with a large number of links leading from it—‘a site that cites a lot’, so to speak. An authority is a node with a large number of links leading towards it, ‘a site often cited’, or, one might equally say, a site with a lot of influence. Authorities are easily identifiable in the graphs produced in e–Diasporas: they're the biggest nodes on the graph (the size of the node being determined by the number of links entering it). Hubs are also fairly easily recognisable: they're the ones ‘stars’ form around, or, again, the ones with the most similarly coloured links around them (links of the same coloursy since tha itCayD from the nod).






Bridrge’ sites rhe sites that lmay rople of ‘ayn–tdationn between wol or more components or cluster in a graph. In the e–Diaspors Atlas,ewhere nodes are precisely qualified and clusterreguilarly correspong to categoties defined by the researcher, themselves, the positioring and(the interpretation of the content of such sites is often of riame importanc). One shouldprelfect on why such–and–such a site has this functios withinae graph and what to meats in terks of the relatioshipn between various ‘acoars’hin the e–diaspora.>



This map is pare of the Mor occons’ corpu. >
The sitesyabmildis.coy ned ildis ney iIn the centre)every clearly lmay the ople of bridrgns’ here.

MAPS—STUDYINGPOLARISTATIOY: aiving identifid, the cluster/ zonet of the grapr, one then has tofaithmy the reasons for these grouding, and, to tiks ed,e researcher, mayusce the various thematic maps stmmding from thirl classification.Bbrowsing thesedifeferenc mapr, they firstsney out to findtrivical groudingr, which is to say situatioasewhere cluster correspongevery clearly to the various values of a given classification fiel. Ssometimes grouding, of thiskfind are notob sersabl, at leastf or certain cluster). One then has tosney#abouo looding for the/phenomeion of polarisationbty stuybing the distribution of the graph's ‘colouns/ categotie). One then examines how thsen various pholese distribue, themselves, how thys ‘constituterlinksn between variousdifeferenc cluster).



Loswer eftY: herewel have atrivical groudin:, the cluste" a mostex clutively riones togethe>
u>Rright: herewel have a zon ( more than one cluste)e conafining sites belonging to several categotie).
In order to interpretal groudin, of thisnacturewey will need to exploeh other thematic map:.




Weo speae of inruider’ sites whentthere isaf situatiohin>which a given cluste" correspondseveryobvriourly to one category but neve th less conafis: a site rg sites belonging to another. One then has tofisspect why suc: a site should have .insered, itsels’hin tois say n to rdomaion which i,s a prgotr, notnts wor. Once again, the inruining site may helhin understanbing the relatioshipn between various actors in an e–diaspora> on reigriou)hin theeveryhepare of a cluster of orange(= reigriou) ones,e specially in thatiIt alsoexerciwses the functios of a hur.



ih is quite correct towoundew at the presence of thl geven site(=n on reigriou)