Presentations

2020

*

The Hermeneutic Circle of Data Visualization: the Case Study of the Affinity Map

D. Rodighiero; A. Romele

Techné: Research in Philosophy and Technology. 2020.

In this article, we show how postphenomenology can be used to analyze a visual method that reveals the hidden dynamics that exist between individuals within large organizations. We make use of the Affinity Map to expand the classic postphenomenology that privileges a ‘linear’ understanding of technological mediations introducing the notions of ‘iterativity’ and ‘collectivity.’ In the first section, both classic and more recent descriptions of human-technology-world relations are discussed to transcendentally approach the discipline of data visualization. In the second section, the Affinity Map case study is used to stress three elements: 1) the collection of data and the design process; 2) the visual grammar of the data visualization, and 3) the process of self-recognition for the map ‘reader.’ In the third section, we introduce the hermeneutic circle of data visualization. Finally, in the concluding section, we put forth how the Affinity Map might be seen as the material encounter between postphenomenology, actor-network theory (ANT), and hermeneutics, through ethical and political multistability.

2019

*

Traduire les données en images

D. Rodighiero

Journée d'étude : Imagination, imaginaire et images des (big) data, Université de Lille, France, 24 January 2019.

*

Traduire les données en images

D. Rodighiero

Séminaire d'écritures numériques et éditorialisation, CNAM Paris et Université de Montreal, January 17th 2019.

*

Translating Data into Images

D. Rodighiero

Séminaire du médialab, Sciences Po, Paris, France, January 15th 2019.

*

Frederic Kaplan Isabella di Lenardo

F. Kaplan; I. di Lenardo

Apollo-The International Art Magazine. 2019-01-01.

2018

*

Informatica per Umanisti: da Venezia al mondo intero attraverso l’Europa

D. Rodighiero

Conferenza per la Società Dante Alighieri, University of Bern, Switzerland, December 10, 2018.

In un momento di apertura del mondo scientifico verso un pubblico più ampio, questa conferenza vuole essere una facile introduzione alle digital humanities. L’argomento del conferenza è infatti l’informatica per umanisti, un nuovo ambito di ricerca che arricchisce le discipline umanistiche attraverso l’uso di nuove tecnologie. La mia esperienza personale sarà il filo conduttore di questa introduzione e la conferenza sarà l’occasione per parlare dei progetti ai quali ho contribuito nel corso degli ultimi cinque anni. Da Parigi a Venezia, da Losanna a Boston, fare ricerca vuol dire fare esperienze in tutto il mondo. Parlerò di Bruno Latour e dei suoi modi d’esistenza, di Frédéric Kaplan e della sua macchina del tempo, di Franco Moretti e della sua lettura a distanza, e di Marilyne Andersen e della sua cartografia delle affinità, tutte persone che ho avuto il piacere di incontrare e hanno arricchito il mio percorso accademico. Attraverso un racconto visuale fatto di immagini e video, vi spiegherò come le Digital Humanities possono rendere archivi, musei e biblioteche luoghi più interessanti per tutti.

*

The Value of Concepts in the Shared Design Practice

D. Rodighiero

Visio et cognitio. Representations of Knowledge, from Medieval to Digital, Porto, Portugal, November 26–27, 2018.

Gilles Deleuze was used to say that philosophers are creators of concepts, extracted by a continuous flux of thinking (Deleuze 1980). These concepts, during a period of intense exchange between disciplines in the 20th century, have not been limited to philosophy as they were employed in Computer Science and Design differently. On one hand, Computer Science uses concepts to stabilize technology language and design architectures (Ciborra 2004); on the other hand, Design makes use of concepts as a creative method in the design process (MoMA 1972). This presentation aims to bring together a common ground in order to establish a shared view that might be useful for the creation of design objects (visualizations, books, and web sites) within the project. Biography - Ciborra, Claudio. 2004. The Labyrinths of Information: Challenging the Wisdom of Systems. Oxford University Press. - Deleuze, Gilles. 1980. “Dernière Année, Vincennes.” Les Courses De Gilles Deleuze. https://www.webdeleuze.com/textes/48. - Museum of Modern Art (MoMA). 1972. Italy: the New Domestic Landscape Achievements and Problems of Italian Design. Edited by Emilio Ambasz. Greenwich, CT: New York Graphic Society. www.moma.org/calendar/exhibitions/1783.

*

Translating Data into Images: The Design Process for the Creation of Visualizations

D. Rodighiero

Séminaire du LaDHUL: faire des SHS avec le numérique, University of Lausanne (UNIL), Switzerland, November 21, 2018.

The design process is a series of endeavors aimed to solve a problem, which is extensively defined as a need, a task, a situation, etc. Visualizations are objects that result from a design process that solves a problem of unreadable data. They are technical mediators that transform data in graphics through a precise authorship that defines their social and political context. One way to understand visualizations is to analyze the process of design: this approach, applied to a case study called Affinity Map, unveils the complexity of an object and reveals the reasons of its creation.

*

Hexagons, Satellites and Semantic Background

D. Rodighiero

Micro Meso Macro, École normale supérieure de Lyon, France, November 15–16, 2018.

The presentation is focused on a visual method that allows for a hexagonal arrangement in network visualization. Hexagonal tilling is a way to enrich the betweenness of nodes in order to enrich the information that a network visualization can convey. What is usually employed as a background is used to show node context and semantic information. This visual method wants to bring a reflection about the visual representation of networks, which needs further developments and ideas.

*

Affinity Map

D. Rodighiero; O. Maitre

ENAC General Assembly, EPFL, Switzerland, November 8, 2018.

This is the final presentation the Affinity Map, a data visualization at the disposal of ENAC members to visualize their own scientific community.

*

Introduction to Data Visualization

D. Rodighiero

Kit de survie en milieu numérique pour l’étudiant en SHS, Institut d'histoire de l'art (INHA), Paris, October 3, 2018.

This would be an introduction to the domain of data visualization. The presentation core is composed of a series of examples organized in three time-based sections: classic visualizations (18th and 19th centuries), modern visualizations (20th century), and contemporary visualizations (21st century). This presentation is not intended to be exhaustive at a historical level, the intent is rather to introduce the subjects of discussion that make these examples interesting, according to a temporal sequence.

*

dhSegment : A generic deep-learning approach for document segmentation

S. Ares Oliveira; B. L. A. Seguin; F. Kaplan

The 16th International Conference on Frontiers in Handwriting Recognition, Niagara Falls, USA, 5-8 August 2018.

*

Comparing human and machine performances in transcribing 18th century handwritten Venetian script

S. Ares Oliveira; F. Kaplan

2018-07-26. Digital Humanities Conference , Mexico City, Mexico , June 24-29, 2018.

Automatic transcription of handwritten texts has made important progress in the recent years. This increase in performance, essentially due to new architectures combining convolutional neural networks with recurrent neutral networks, opens new avenues for searching in large databases of archival and library records. This paper reports on our recent progress in making million digitized Venetian documents searchable, focusing on a first subset of 18th century fiscal documents from the Venetian State Archives. For this study, about 23’000 image segments containing 55’000 Venetian names of persons and places were manually transcribed by archivists, trained to read such kind of handwritten script. This annotated dataset was used to train and test a deep learning architecture with a performance level (about 10% character error rate) that is satisfactory for search use cases. This paper compares this level of reading performance with the reading capabilities of Italian-speaking transcribers. More than 8500 new human transcriptions were produced, confirming that the amateur transcribers were not as good as the expert. However, on average, the machine outperforms the amateur transcribers in this transcription tasks.

*

Comparing human and machine performances in transcribing 18th century handwritten Venetian script

S. Ares Oliveira

Digital Humanities Conference, Mexico City, Mexico, June 25-29.

Automatic transcription of handwritten texts has made important progress in the recent years. This increase in performance, essentially due to new architectures combining convolutional neural networks with recurrent neutral networks, opens new avenues for searching in large databases of archival and library records. This paper reports on our recent progress in making million digitized Venetian documents searchable, focusing on a first subset of 18th century fiscal documents from the Venetian State Archives. For this study, about 23’000 image segments containing 55’000 Venetian names of persons and places were manually transcribed by archivists, trained to read such kind of handwritten script. This annotated dataset was used to train and test a deep learning architecture with a performance level (about 10% character error rate) that is satisfactory for search use cases. This paper compares this level of reading performance with the reading capabilities of Italian-speaking transcribers. More than 8500 new human transcriptions were produced, confirming that the amateur transcribers were not as good as the expert. However, on average, the machine outperforms the amateur transcribers in this transcription tasks.

*

The Scholar Index: Towards a Collaborative Citation Index for the Arts and Humanities

G. Colavizza; M. Romanello; M. Babetto; V. Barbay; L. Bolli et al.

Mexico City, 26-29 June 2018.

*

Using Networks to Visualize Publications

D. Rodighiero

EUROLIB General Assembly, Joint Research Centre of European Commission - Ispra (VA), Italy, 30 May - 1 June 2018.

Retrieval systems are often shaped as lists organized in pages. However, the majority of users look at the first page ignoring the other ones. This presentation concerns an alterna- tive way to present the results of a query using network visualizations.
 The presentation includes a case study that concerns a school of management. Its whole publications are arranged in a network visualization according to their lexical proximity, based on a technique called Term Frequency – Inverse Document Frequency (TF-IDF). These terms are further used to fill the space between the network nodes, creating a sort of semantic background. The case study shows pros and cons of such visual representa- tion through practical examples of term extraction and visualization interaction.

*

Deep Learning for Logic Optimization Algorithms, 2018 IEEE International Symposium on Circuits and Systems (ISCAS)

W. J. Haaswijk; E. Collins; B. Seguin; M. Soeken; F. Kaplan et al.

2018-05-27. 2018 IEEE International Symposium on Circuits and Systems (ISCAS) , Florence, Italy , May 27-30, 2018. p. 1-4.

DOI : 10.1109/ISCAS.2018.8351885.

The slowing down of Moore's law and the emergence of new technologies puts an increasing pressure on the field of EDA. There is a constant need to improve optimization algorithms. However, finding and implementing such algorithms is a difficult task, especially with the novel logic primitives and potentially unconventional requirements of emerging technologies. In this paper, we cast logic optimization as a deterministic Markov decision process (MDP). We then take advantage of recent advances in deep reinforcement learning to build a system that learns how to navigate this process. Our design has a number of desirable properties. It is autonomous because it learns automatically and does not require human intervention. It generalizes to large functions after training on small examples. Additionally, it intrinsically supports both single- and multi-output functions, without the need to handle special cases. Finally, it is generic because the same algorithm can be used to achieve different optimization objectives, e.g., size and depth.

*

Automatic information extraction from historical collections: the case of the 1808 venetian cadaster

S. Ares Oliveira

CREATE Salon ‘Heterogeneous archives’, Amsterdam, Netherlands, 05.04.2018.

The presentation reports on the on-going work to automatically process heterogeneous historical documents. After a quick overview of the general processing pipeline, a few examples are more comprehensively described. The recent progress in making large collections of digitised documents searchable is also be presented through the results of the automatic transcription of named entities in 18th century venetian fiscal documents. Finally, the case of the 1808 venetian cadaster is used to illustrate the general approach and the results of the processing of the whole 1808 cadaster are presented.

*

Detecting Text Reuse in Newspapers Data with Passim

M. Romanello

Hacking the News, co-located with DHNordic 2018, Helsinki, Finland, 5-6 March 2018.

*

Mapping Affinities in Academic Organizations

D. Rodighiero; F. Kaplan; B. Beaude

Frontiers in Research Metrics and Analytics. 2018-02-19.

DOI : 10.3389/frma.2018.00004.

Scholarly affinities are one of the most fundamental hidden dynamics that drive scientific development. Some affinities are actual, and consequently can be measured through classical academic metrics such as co-authoring. Other affinities are potential, and therefore do not leave visible traces in information systems; for instance, some peers may share interests without actually knowing it. This article illustrates the development of a map of affinities for academic collectives, designed to be relevant to three audiences: the management, the scholars themselves, and the external public. Our case study involves the School of Architecture, Civil and Environmental Engineering of EPFL, hereinafter ENAC. The school consists of around 1,000 scholars, 70 laboratories, and 3 institutes. The actual affinities are modeled using the data available from the information systems reporting publications, teaching, and advising scholars, whereas the potential affinities are addressed through text mining of the publications. The major challenge for designing such a map is to represent the multi-dimensionality and multi-scale nature of the information. The affinities are not limited to the computation of heterogeneous sources of information; they also apply at different scales. The map, thus, shows local affinities inside a given laboratory, as well as global affinities among laboratories. This article presents a graphical grammar to represent affinities. Its effectiveness is illustrated by two actualizations of the design proposal: an interactive online system in which the map can be parameterized, and a large-scale carpet of 250 square meters. In both cases, we discuss how the materiality influences the representation of data, in particular the way key questions could be appropriately addressed considering the three target audiences: the insights gained by the management and their consequences in terms of governance, the understanding of the scholars’ own positioning in the academic group in order to foster opportunities for new collaborations and, eventually, the interpretation of the structure from a general public to evaluate the relevance of the tool for external communication.

*

Negentropic linguistic evolution: A comparison of seven languages

V. Buntinx; F. Kaplan

2018. Digital Humanities 2018 , Mexico City, Mexico , June 26-29, 2018.

The relationship between the entropy of language and its complexity has been the subject of much speculation – some seeing the increase of linguistic entropy as a sign of linguistic complexification or interpreting entropy drop as a marker of greater regularity. Some evolutionary explanations, like the learning bottleneck hypothesis, argues that communication systems having more regular structures tend to have evolutionary advantages over more complex structures. Other structural effects of communication networks, like globalization of exchanges or algorithmic mediation, have been hypothesized to have a regularization effect on language. Longer-term studies are now possible thanks to the arrival of large-scale diachronic corpora, like newspaper archives or digitized libraries. However, simple analyses of such datasets are prone to misinterpretations due to significant variations of corpus size over the years and the indirect effect this can have on various measures of language change and linguistic complexity. In particular, it is important not to misinterpret the arrival of new words as an increase in complexity as this variation is intrinsical, as is the variation of corpus size. This paper is an attempt to conduct an unbiased diachronic study of linguistic complexity over seven different languages using the Google Books corpus. The paper uses a simple entropy measure on a closed, but nevertheless large, subset of words, called kernels. The kernel contains only the words that are present without interruption for the whole length of the study. This excludes all the words that arrived or disappeared during the period. We argue that this method is robust towards variations of corpus size and permits to study change in complexity despite possible (and in the case of Google Books unknown) change in the composition of the corpus. Indeed, the evolution observed on the seven different languages shows rather different patterns that are not directly correlated with the evolution of the size of the respective corpora. The rest of the paper presents the methods followed, the results obtained and the next steps we envision.

*

dhSegment: A generic deep-learning approach for document segmentation

S. A. Oliveira; B. Seguin; F. Kaplan

2018-01-01. 16th International Conference on Frontiers in Handwriting Recognition (ICFHR) , Niagara Falls, NY , Aug 05-08, 2018. p. 7-12.

DOI : 10.1109/ICFHR-2018.2018.00011.

In recent years there have been multiple successful attempts tackling document processing problems separately by designing task specific hand-tuned strategies. We argue that the diversity of historical document processing tasks prohibits to solve them one at a time and shows a need for designing generic approaches in order to handle the variability of historical series. In this paper, we address multiple tasks simultaneously such as page extraction, baseline extraction, layout analysis or multiple typologies of illustrations and photograph extraction. We propose an open-source implementation of a CNN-based pixel-wise predictor coupled with task dependent post-processing blocks. We show that a single CNN-architecture can be used across tasks with competitive results. Moreover most of the task-specific post-precessing steps can be decomposed in a small number of simple and standard reusable operations, adding to the flexibility of our approach.

*

Deep Learning for Logic Optimization Algorithms

W. Haaswijk; E. Collins; B. Seguin; M. Soeken; F. Kaplan et al.

2018-01-01. IEEE International Symposium on Circuits and Systems (ISCAS) , Florence, ITALY , May 27-30, 2018.

The slowing down of Moore's law and the emergence of new technologies puts an increasing pressure on the field of EDA. There is a constant need to improve optimization algorithms. However, finding and implementing such algorithms is a difficult task, especially with the novel logic primitives and potentially unconventional requirements of emerging technologies. In this paper, we cast logic optimization as a deterministic Markov decision process (MDP). We then take advantage of recent advances in deep reinforcement learning to build a system that learns how to navigate this process. Our design has a number of desirable properties. It is autonomous because it learns automatically and does not require human intervention. It generalizes to large functions after training on small examples. Additionally, it intrinsically supports both single-and multioutput functions, without the need to handle special cases. Finally, it is generic because the same algorithm can be used to achieve different optimization objectives, e. g., size and depth.

*

Making large art historical photo archives searchable

B. L. A. Seguin / F. Kaplan; I. di Lenardo (Dir.)

Lausanne, EPFL, 2018.

DOI : 10.5075/epfl-thesis-8857.

In recent years, museums, archives and other cultural institutions have initiated important programs to digitize their collections. Millions of artefacts (paintings, engravings, drawings, ancient photographs) are now represented in digital photographic format. Furthermore, through progress in standardization, a growing portion of these images are now available online, in an easily accessible manner. This thesis studies how such large-scale art history collection can be made searchable using new deep learning approaches for processing and comparing images. It takes as a case study the processing of the photo archive of the Foundation Giorgio Cini, where more than 300'000 images have been digitized. We demonstrate how a generic processing pipeline can reliably extract the visual and textual content of scanned images, opening up ways to efficiently digitize large photo-collections. Then, by leveraging an annotated graph of visual connections, a metric is learnt that allows clustering and searching through artwork reproductions independently of their medium, effectively solving a difficult problem of cross-domain image search. Finally, the thesis studies how a complex Web Interface allows users to perform different searches based on this metric. We also evaluate the process by which users can annotate elements of interest during their navigation to be added to the database, allowing the system to be trained further and give better results. By documenting a complete approach on how to go from a physical photo-archive to a state-of-the-art navigation system, this thesis paves the way for a global search engine across the world's photo archives.

*

The Intellectual Organisation of History

G. Colavizza / F. Kaplan; M. Franceschet (Dir.)

Lausanne, EPFL, 2018.

DOI : 10.5075/epfl-thesis-8537.

A tradition of scholarship discusses the characteristics of different areas of knowledge, in particular after modern academia compartmentalized them into disciplines. The academic approach is often put to question: are there two or more cultures? Is an ever-increasing specialization the only way to cope with information abundance or are holistic approaches helpful too? What is happening with the digital turn? If these questions are well studied for the sciences, our understanding of how the humanities might differ in their own respect is far less advanced. In particular, modern academia might foster specific patterns of specialization in the humanities. Eventually, the recent rise in the application of digital methods to research, known as the digital humanities, might be introducing structural adaptations through the development of shared research technologies and the advent of organizational practices such as the laboratory. It therefore seems timely and urgent to map the intellectual organization of the humanities. This investigation depends on few traits such as the level of codification, the degree of agreement among scholars, the level of coordination of their efforts. These characteristics can be studied by measuring their influence on the outcomes of scientific communication. In particular, this thesis focuses on history as a discipline using bibliometric methods. In order to explore history in its complexity, an approach to create collaborative citation indexes in the humanities is proposed, resulting in a new dataset comprising monographs, journal articles and citations to primary sources. Historians' publications were found to organize thematically and chronologically, sharing a limited set of core sources across small communities. Core sources act in two ways with respect to the intellectual organization: locally, by adding connectivity within communities, or globally as weak ties across communities. Over recent decades, fragmentation is on the rise in the intellectual networks of historians, and a comparison across a variety of specialisms from the human, natural and mathematical sciences revealed the fragility of such networks across the axes of citation and textual similarities. Humanists organize into more, smaller and scattered topical communities than scientists. A characterisation of history is eventually proposed. Historians produce new historiographical knowledge with a focus on evidence or interpretation. The former aims at providing the community with an agreed-upon factual resource. Interpretive work is instead mainly focused on creating novel perspectives. A second axe refers to two modes of exploration of new ideas: in-breadth, where novelty relates to adding new, previously unknown pieces to the mosaic, or in-depth, if novelty then happens by improving on previous results. All combinations possible, historians tend to focus on in-breadth interpretations, with the immediate consequence that growth accentuates intellectual fragmentation in the absence of further consolidating factors such as theory or technologies. Research on evidence might have a different impact by potentially scaling-up in the digital space, and in so doing influence the modes of interpretation in turn. This process is not dissimilar to the gradual rise in importance of research technologies and collaborative competition in the mathematical and natural sciences. This is perhaps the promise of the digital humanities.

*

The Hermenutic Circle of Data Visualization

D. Rodighiero

Human-Technology Relations, University of Twente, 11-13 July 2018.

According to Don Ihde (1990, 80–97), hermeneutic relations are a specific kind of technologically mediated I-world relations in which the technology must be “read” and “interpreted” in order to access the world. More recently, Peter-Paul Verbeek (2010, 145) introduced the notion of composite relations, featuring a double intentionality performed by human and non-human actors. The aim of this article is to expand these two concepts by reflecting on data visualization. In particular, we will deal with a visualization called Affinity Map, which displays the scholars of EPFL (École Polytechnique Fédérale de Lausanne) arranged according to a metric based on the collaboration. Two specificities characterize, for us, such configuration. 1) First, the subjectivization of what hermeneutics have called the “world of the text” because scholars are both readers and contributors. In other terms, hermeneutic relations are here technologically-mediated self-relations in a strong sense. 2) Second, the collectivization of the hermeneutic circle in each of its steps: subjects, data, designers, actualizations, visualizations, and readers. In this respect, the Affinity Map might be seen as a concrete encounter between postphenomenology, whose main focus is on the types of I-technology-world relations, and actor-network theory, which has always-already been concerned with collectives.

*

Printing Walkable Visualizations

D. Rodighiero

2018. 5th Biennial Research Transdisciplinary Imaging Conference , Edunburgh, UK , April 18-20, 2018.

DOI : 10.6084/m9.figshare.6104693.v2.

This article concerns a specific actor in the actualization process, the media. The conventional media for visualizations is the computer screen, a visual device that supports the practices of design and reading. However, visualizations also appear in other ways, for example as posters, articles, books, or projections. This article focuses, in particular, on a pretty unusual medium called floor or walkable visualization.

*

Mapping affinities: visualizing academic practice through collaboration

D. Rodighiero / F. Kaplan; B. Beaude (Dir.)

EPFL, 2018.

DOI : 10.5075/epfl-thesis-8242.

Academic affinities are one of the most fundamental hidden dynamics that drive scientific development. Some affinities are actual, and consequently can be measured through classical academic metrics such as co-authoring. Other affinities are potential, and therefore do not have visible traces in information systems; for instance, some peers may share scientific interests without actually knowing it. This thesis illustrates the development of a map of affinities for scientific collectives, which is intended to be relevant to three audiences: the management, the scholars themselves, and the external public. Our case study involves the School of Architecture, Civil and Environmental Engineering of EPFL, which consists of three institutes, seventy laboratories, and around one thousand employees. The actual affinities are modeled using the data available from the academic systems reporting publications, teaching, and advising, whereas the potential affinities are addressed through text mining of the documents registered in the information system. The major challenge for designing such a map is to represent the multi-dimensionality and multi-scale nature of the information. The affinities are not limited to the computation of heterogeneous sources of information, they also apply at different scales. Therefore, the map shows local affinities inside a given laboratory, as well as global affinities among laboratories. The thesis presents a graphical grammar to represent affinities. This graphical system is actualized in several embodiments, among which a large-scale carpet of 250 square meters and an interactive online system in which the map can be parameterized. In both cases, we discuss how the actualization influences the representation of data, in particular the way key questions could be appropriately addressed considering the three target audiences: the insights gained by the management and the relative decisions, the understanding of the researchers’ own positioning in the academic collective that might reveal opportunities for new synergies, and eventually the interpretation of the structure from an external standpoint that suggesting the relevance of the tool for communication.

*

Printing materials and technologies in the 15th−17th century book production: an undervalued research field

F. Albertin; E. Balliana; G. Pizzol; G. Colavizza; E. Zendri et al.

Microchemical Journal. 2018.

DOI : 10.1016/j.microc.2017.12.010.

We present a systematic non-invasive investigation of a large corpus of early printed books, exploiting multiple techniques. This work is part of a broader project -- Argeia -- aiming to study early printing technologies, their evolution and, potentially, the identification of physical/chemical fingerprints of different manufactures and/or printing dates. We analyzed sixty volumes, part of the important collection of the Ateneo Veneto in Venice (Italy), printed between the 15th and the 17th centuries in the main European manufacturing centers. We present here the results of the imaging analysis of the entire corpus and the X-Ray Fluorescence (XRF) investigation performed, focusing on the XRF data and their statistical treatment using a combination of Principal Component Analysis (PCA) and Logistic Regression. Thanks to the broad XRF investigation -- more than 200 data points -- and to the multidisciplinary approach, we were able to discriminate the provenances of the paper -- in particular for the German and Venetian volumes -- and we potentially identified a chemical fingerprint of Venetian papers.

*

Characterizing in-text citations in scientific articles: A large-scale analysis

K. W. Boyack; N. J. van Eck; G. Colavizza; L. Waltman

Journal of Informetrics. 2018.

DOI : 10.1016/j.joi.2017.11.005.

We report characteristics of in-text citations in over five million full text articles from two large databases – the PubMed Central Open Access subset and Elsevier journals – as functions of time, textual progression, and scientific field. The purpose of this study is to understand the characteristics of in-text citations in a detailed way prior to pursuing other studies focused on answering more substantive research questions. As such, we have analyzed in-text citations in several ways and report many findings here. Perhaps most significantly, we find that there are large field-level differences that are reflected in position within the text, citation interval (or reference age), and citation counts of references. In general, the fields of Biomedical and Health Sciences, Life and Earth Sciences, and Physical Sciences and Engineering have similar reference distributions, although they vary in their specifics. The two remaining fields, Mathematics and Computer Science and Social Science and Humanities, have different reference distributions from the other three fields and between themselves. We also show that in all fields the numbers of sentences, references, and in-text mentions per article have increased over time, and that there are field-level and temporal differences in the numbers of in-text mentions per reference. A final finding is that references mentioned only once tend to be much more highly cited than those mentioned multiple times.

*

The Closer the Better: Similarity of Publication Pairs at Different Co-Citation Levels

G. Colavizza; K. W. Boyack; N. J. van Eck; L. Waltman

Journal of the Association for Information Science and Technology. 2018.

DOI : 10.1002/asi.23981.

We investigate the similarities of pairs of articles which are co-cited at the different co- citation levels of the journal, article, section, paragraph, sentence and bracket. Our results indicate that textual similarity, intellectual overlap (shared references), author overlap (shared authors), proximity in publication time all rise monotonically as the co-citation level gets lower (from journal to bracket). While the main gain in similarity happens when moving from journal to article co-citation, all level changes entail an increase in similarity, especially section to paragraph and paragraph to sentence/bracket levels. We compare results from four journals over the years 2010-2015: Cell, the European Journal of Operational Research, Physics Letters B and Research Policy, with consistent general outcomes and some interesting differences. Our findings motivate the use of granular co-citation information as defined by meaningful units of text, with implications for, among others, the elaboration of maps of science and the retrieval of scholarly literature.

2017

*

Machine Vision algorithms on cadaster plans

S. Ares Oliveira

PlatformDH, Antwerp, Belgium, 04.12.2017.

*

Layout Analysis on Newspaper Archives

V. Buntinx; F. Kaplan; A. Xanthos

2017. Digital Humanities 2017 , Montreal, Canada , August 8-11, 2017.

The study of newspaper layout evolution through historical corpora has been addressed by diverse qualitative and quantitative methods in the past few years. The recent availability of large corpora of newspapers is now making the quantitative analysis of layout evolution ever more popular. This research investigates a method for the automatic detection of layout evolution on scanned images with a factorial analysis approach. The notion of eigenpages is defined by analogy with eigenfaces used in face recognition processes. The corpus of scanned newspapers that was used contains 4 million press articles, covering about 200 years of archives. This method can automatically detect layout changes of a given newspaper over time, rebuilding a part of its past publishing strategy and retracing major changes in its history in terms of layout. Besides these advantages, it also makes it possible to compare several newspapers at the same time and therefore to compare the layout changes of multiple newspapers based only on scans of their issues.

*

Machine Vision Algorithms on Cadaster Plans

S. Ares Oliveira; I. di Lenardo; F. Kaplan

2017. Premiere Annual Conference of the International Alliance of Digital Humanities Organizations (DH 2017) , Montreal, Canada , August 8-11, 2017.

Cadaster plans are cornerstones for reconstructing dense representations of the history of the city. They provide information about the city urban shape, enabling to reconstruct footprints of most important urban components as well as information about the urban population and city functions. However, as some of these handwritten documents are more than 200 years old, the establishment of processing pipeline for interpreting them remains extremely challenging. We present the first implementation of a fully automated process capable of segmenting and interpreting Napoleonic Cadaster Maps of the Veneto Region dating from the beginning of the 19th century. Our system extracts the geometry of each of the drawn parcels, classifies, reads and interprets the handwritten labels.

*

Littérature Potentielle 2.0

C. B. Daniel de Roulet

Le Persil. 2017.

*

Machine Vision Algorithms on Cadaster Plans

S. Ares Oliveira; I. di Lenardo

Premiere Annual Conference of the International Alliance of Digital Humanities Organizations (DH 2017), Montreal, Canada, August 8-11, 2017.

Cadaster plans are cornerstones for reconstructing dense representations of the history of the city. They provide information about the city urban shape, enabling to reconstruct footprints of most important urban components as well as information about the urban population and city functions. However, as some of these handwritten documents are more than 200 years old, the establishment of processing pipeline for interpreting them remains extremely challenging. We present the first implementation of a fully automated process capable of segmenting and interpreting Napoleonic Cadaster Maps of the Veneto Region dating from the beginning of the 19th century. Our system extracts the geometry of each of the drawn parcels, classifies, reads and interprets the handwritten labels.

*

Mapping large organizations

D. Rodighiero

IMD Business School, Lausanne, Switzerland, December 13, 2017.

Today organizations are more than ever complex systems. They are large, ramified, distributed, and intertwined so that their organic structure seems like a tangle of activities. Day by day individuals contribute to keep these structures alive with their work, thoughts, and personalities, and as a result organizations rely on these daily practices. Contemporary sociology aims to untangle the network of practices through the analysis of digital traces that are distributed in desktop computers, smart phones, Wi-Fi and GPS signals, payment systems, badges, information systems, etc. Digital traces are all the information that individuals leave behind them during daily activi-ties. The challenge is therefore to recompose a faithful image of an organization us-ing the data that its members left behind in various forms.

*

Using Linked Open Data to Bootstrap a Knowledge Base of Classical Texts

M. Romanello; M. Pasin

2017. Second Workshop on Humanities in the Semantic Web (WHiSe II) co-located with 16th International Semantic Web Conference (ISWC 2017) , Vienna, Austria , October 22, 2017. p. 3-14.

We describe a domain-specific knowledge base aimed at supporting the extraction of bibliographic references in the domain of Classics. In particular, we deal with references to canonical works of the Greek and Latin literature by providing a model that represents key aspects of this domain such as names and abbreviations of authors, the canonical structure of classical works, and links to related web resources. Finally, we show how the availability of linked data in the emerging Graph of Ancient World Data has helped bootstrapping the creation of our knowledge base.

*

Mapping the Analogous City

D. Rodighiero

Re-descriptions: Aldo Rossi’s Architectural Composition, Delft University of Technology, Netherlands, December 1, 2017.

Re-descriptions: Aldo Rossi’s Architectural Composition. Chair of Architecture and the Public Building, Friday 1 December 2017, Room U 10:45 – 12:30. On the event of the publication in the new issue of OverHolland of the English translation of Ezio Bonfanti’s seminal essay ‘Elementi e costruzione: Note sull’architettura di Aldo Rossi’ (1970), the Chair of Public Building organises a lecture event to examine Rossi‘s compositional procedures, their relationship with his architectural theory and, more in general, their significance for architectural design research. In addition to the introduction of Bonfanti’s reading of Rossi’s work by Henk Engel and Stefano Milani, invited speaker Dario Rodighero (École polytechnique fédérale de Lausanne) will present the research ‘The Analogous City. The Map’ which examines ‘piece by piece’ Aldo Rossi‘s collage executed in 1976 in cooperation with Eraldo Consolascio, Bruno Reichlin and Fabio Reinhart. Programme: 10:45 – 11:00 Henk Engel (TUD), Introduction to OverHolland nos. 18-19; 11:00 – 11:30 Stefano Milani (TUD), Re-descriptions: On Ezio Bonfanti’s essay ‘Elements and constructions. Notes on Aldo Rossi’s architecture’; 11:30 – 12:15 Dario Rodighero (EPFL), Mapping the Analogous City; 12:15 – 12:45 Discussion.

*

Analyse multi-échelle de n-grammes sur 200 années d'archives de presse

V. Buntinx / F. Kaplan; A. Xanthos (Dir.)

Lausanne, EPFL, 2017.

DOI : 10.5075/epfl-thesis-8180.

The recent availability of large corpora of digitized texts over several centuries opens the way to new forms of studies on the evolution of languages. In this thesis, we study a corpus of 4 million press articles covering a period of 200 years. The thesis tries to measure the evolution of written French on this period at the level of words and expressions, but also in a more global way by attempting to define integrated measures of linguistic evolution. The methodological choice is to introduce a minimum of linguistic hypotheses in this study by developing new measures around the simple notion of n-gram, a sequence of n consecutive words. The thesis explores on this basis the potential of already known concepts as temporal frequency profiles and their diachronic correlations, but also introduces new abstractions such as the notion of resilient linguistic kernel or the decomposition of profiles into solidified expressions according to simple statistical models. Through the use of distributed computational techniques, it develops methods to test the relevance of these concepts on a large amount of textual data and thus allows to propose a virtual observatory of the diachronic evolutions associated with a given corpus. On this basis, the thesis explores more precisely the multi-scale dimension of linguistic phenomena by considering how standardized measures evolve when applied to increasingly long n-grams. The discrete and continuous scale from the isolated entities (n=1) to the increasingly complex and structured expressions (1 < n < 10) offers a transversal axis of study to the classical differentiations that ordinarily structure linguistics: syntax, semantics, pragmatics, and so on. The thesis explores the quantitative and qualitative diversity of phenomena at these different scales of language and develops a novel approach by proposing multi-scale measurements and formalizations, with the aim of characterizing more fundamental structural aspects of the studied phenomena.

*

Annotated References in the Historiography on Venice: 19th–21st centuries

G. Colavizza; M. Romanello

Journal of Open Humanities Data. 2017.

DOI : 10.5334/johd.9.

We publish a dataset containing more than 40’000 manually annotated references from a broad corpus of books and journal articles on the history of Venice. References were considered from both reference lists and footnotes, include primary and secondary sources, in full or abbreviated form. The dataset comprises references from publications from the 19th to the 21st century. References were collected from a newly digitized corpus and manually annotated in all their constituent parts. The dataset is stored on a GitHub repository, persisted in Zenodo, and it is accompanied with code to train parsers in order to extract references from other publications. Two trained Conditional Random Fields models are provided along with their evaluation, in order to act as a baseline for a parsing shared task. No comparable public dataset exists to support the task of reference parsing in the humanities. The dataset is of interest to all working on the domain of reference parsing and citation extraction in the humanities.

*

TimeRank: A dynamic approach to rate scholars using citations

M. Franceschet; G. Colavizza

Journal of Informetrics. 2017.

DOI : 10.1016/j.joi.2017.09.003.

Rating has become a common practice of modern science. No rating system can be considered as final, but instead several approaches can be taken, which magnify different aspects of the fabric of science. We introduce an approach for rating scholars which uses citations in a dynamic fashion, allocating ratings by considering the relative position of two authors at the time of the citation among them. Our main goal is to introduce the notion of citation timing as a complement to the usual suspects of popularity and prestige. We aim to produce a rating able to account for a variety of interesting phenomena, such as positioning raising stars on a more even footing with established researchers. We apply our method on the bibliometrics community using data from the Web of Science from 2000 to 2016, showing how the dynamic method is more effective than alternatives in this respect.

*

Méduse, vers des visualisations plus complexes que le réseau

A. Rigal; D. Rodighiero

Chronotopies, lecture et écriture des mondes en mouvement; Grenoble: Elya Éditions, 2017.

Dans le cadre de la « Conférence internationale des Humanités Numériques » qui s’est tenue en 2014 à Lausanne, nous avons réalisé une représentation. Celle-ci est une cartographie en réseau des auteurs et des mots-clés de la conférence. Les cartes résultantes sont reproduites sur divers objets : bâche, tapis, ouvrages, posters, tasses. Ces derniers avaient pour fonction de susciter l'intérêt des auteurs et leur identi cation au champ des humanités numériques. La qualité de la car- tographie en réseau est qu'elle exclut peu d'acteurs et dans notre cas peu de participants. De ce fait un grand nombre de participants à la conférence a pu se trouver sur la représentation et par là prendre part au collectif suggéré par les liens de la cartographie. Par ces reproductions, qui ne sont jamais vraiment mécaniques, la représentation a circulé en alimentant des interprétations qui tracent les contours d'un collectif propre à la conférence. Les traces fabriquées par les participants - commentaires de la cartographie, photos, souvenirs, tweets, etc. -, permettent de suivre la trajectoire de la représentation. Par conséquent, savoir si la représentation a réussi revient à enquêter sur l’étendue et la qualité de sa trajectoire entre les épreuves. L’enjeu de cet article est donc d’enquêter sur le design cartographique en tant qu’art du rassemblement, grâce aux outils du design cartographique.

*

Linked Lexical Knowledge Bases Foundations and Applications

M. Ehrmann

Computational Linguistics. 2017.

DOI : 10.1162/COLI_r_00289.

*

The Core Literature of the Historians of Venice

G. Colavizza

Frontiers in Digital Humanities. 2017.

DOI : 10.3389/fdigh.2017.00014.

Over the past decades, the humanities have been accumulating a growing body of literature at an increasing pace. How does this impact their traditional organization into disciplines and fields of research therein? This article considers history, by examining a citation network among recent monographs on the history of Venice. The resulting network is almost connected, clusters of monographs are identifiable according to specific disciplinary areas (history, history of architecture, and history of arts) or periods of time (middle ages, early modern, and modern history), and a map of the recent trends in the field is sketched. Most notably a set of highly cited works emerges as the core literature of the historians of Venice. This core literature comprises a mix of primary sources, works of reference, and scholarly monographs and is important in keeping the field connected: monographs usually cite a combination of few core and a variety of less well-cited works. Core primary sources and works of reference never age, while core scholarly monographs are replaced at a very slow rate by new ones. The reliance of new publications on the core literature is slowly rising over time, as the field gets increasingly more varied.

*

The structural role of the core literature in history

G. Colavizza

Scientometrics. 2017.

DOI : 10.1007/s11192-017-2550-4.

The intellectual landscapes of the humanities are mostly uncharted territory. Little is known on the ways published research of humanist scholars defines areas of intellectual activity. An open question relates to the structural role of core literature: highly cited sources, naturally playing a disproportionate role in the definition of intellectual landscapes. We introduce four indicators in order to map the structural role played by core sources into connecting different areas of the intellectual landscape of citing publications (i.e. communities in the bibliographic coupling network). All indicators factor out the influence of degree distributions by internalizing a null configuration model. By considering several datasets focused on history, we show that two distinct structural actions are performed by the core literature: a global one, by connecting otherwise separated communities in the landscape, or a local one, by rising connectivity within communities. In our study, the global action is mainly performed by small sets of scholarly monographs, reference works and primary sources, while the rest of the core, and especially most journal articles, acts mostly locally.

*

Apprenticeship in Early Modern Venice

A. Bellavitis; R. Cella; G. Colavizza

2017.

The desire of the Republican state to regulate the production and sale of food led to the establishment, during the twelfth century, of the Giustizia Vecchia, a magistracy which later developed an authority over the majority of the city’s guilds. The further decision to set a public register of contracts of apprenticeship reflects the ambition of Venetian authorities to regulate and control both vocational training and access to the urban job market, acting as a guarantor between masters and young apprentices. This chapter presents an historical overview of apprenticeship in early modern Venice, examining the development of the city’s legislation on the matter, and analysing a new sample of contracts recorded in the city’s apprenticeship registers during the sixteenth and seventeenth centuries. In particular, we discuss the complex relationship between the general legal framework established by Venetian public authorities and the particular set of norms detailed in guilds statutes. Our analysis reveals that apprenticeship contracts were used to accommodate a variety of situations, including paying for intense training to masked working contracts, while following the general framework provided by state and guild regulations. We then present an in-depth study of apprenticeship contracts from three crafts (goldsmiths, carpenters and printers), chosen for their economic importance, and because they possibly represented different realities in terms of technological specialization, capital (or labour) intensiveness and typology of market. This highlights yet another aspect of apprenticeship in Venice: the influence of guilds. Some guilds such as the Goldsmiths, were more closed to foreigners, favouring Venetians instead. Apprenticeship in early modern Venice is an institution which, despite appearing as highly regulated and formalized, accommodated a variety of realities with remarkable flexibility.

*

A Simple Set of Rules for Characters and Place Recognition in French Novels

C. Bornet; F. Kaplan

Frontiers in Digital Humanities. 2017.

DOI : 10.3389/fdigh.2017.00006.

*

Big Data of the Past

F. Kaplan; I. di Lenardo

Frontiers in Digital Humanities. 2017.

DOI : 10.3389/fdigh.2017.00012.

Big Data is not a new phenomenon. History is punctuated by regimes of data acceleration, characterized by feelings of information overload accompanied by periods of social transformation and the invention of new technologies. During these moments, private organizations, administrative powers, and sometimes isolated individuals have produced important datasets, organized following a logic that is often subsequently superseded but was at the time, nevertheless, coherent. To be translated into relevant sources of information about our past, these document series need to be redocumented using contemporary paradigms. The intellectual, methodological, and technological challenges linked to this translation process are the central subject of this article.

*

Narrative Recomposition in the Context of Digital Reading

C. A. M. Bornet / F. Kaplan (Dir.)

Lausanne, EPFL, 2017.

DOI : 10.5075/epfl-thesis-7592.

In any creative process, the tools one uses have an immediate influence on the shape of the final artwork. However, while the digital revolution has redefined core values in most creative domains over the last few decades, its impact on literature remains limited. This thesis explores the relevance of digital tools for several aspects of novels writing by focusing on two research questions: Is it possible for an author to edit better novels out of already published ones, given the access to adapted tools? And, will authors change their way of writing when they know how they are being read? This thesis is a multidisciplinary participatory study, actively involving the Swiss novelist Daniel de Roulet, to construct measures, visualizations, and digital tools aimed at leveraging the process of dynamic reordering of narrative material, similar to how one edits a video footage. We developed and tested various text analysis and visualization tools, the results of which were interpreted and used by the author to recompose a family saga out of material he has been writing for twenty-four years. Based on this research, we released Saga+, an online editing, publishing, and reading tool. The platform was handed out to third parties to improve existing writings, making new novels available to the public as a result. While many researchers have studied the structuration of texts either through global statistical features or micro-syntactic analyses, we demonstrate that by allowing visualization and interaction at an intermediary level of organisation, authors can manipulate their own texts in agile ways. By integrating readers’ traces into this newly revealed structure, authors can start to approach the question of optimizing their writing processes in ways that are similar to what is being practiced in other media industries. The introduction of tools for optimal composition opens new avenues for authors, as well as a controversial debate regarding the future of literature.

*

Optimized scripting in Massive Open Online Courses

F. Kaplan; I. di Lenardo

Dariah Teach, Université de Lausanne, Switzerland, March 23-24, 2017.

The Time Machine MOOC, currently under preparation, is designed to provide the necessary knowledge for students to use the editing tool of the Time Machine platform. The first test case of the platform in centered on our current work on the City of Venice and its archives. Small Teaching modules focus on specific skills of increasing difficulty: segmenting a word on a page, transcribing a word from a document series, georeferencing ancient maps using homologous points, disambiguating named entities, redrawing urban structures, finding matching details between paintings and writing scripts that perform automatically some of these tasks. Other skills include actions in the physical world, like scanning pages, books, maps or performing a photogrammetric reconstruction of a sculpture taking a large number of pictures. Eventually, some other modules are dedicated to general historic, linguistic, technical or archival knowledge that constitute prerequisites for mastering specific tasks. A general dependency graph has been designed, specifying in which order the skills can be acquired. The performance of most tasks can be tested using some pre-defined exercises and evaluation metrics, which allows for a precise evaluation of the level of mastery of each student. When the student successfully passes the test related to a skill, he or she gets the credentials to use that specific tool in the platform and starts contributing. However, the teaching options can vary greatly for each skill. Building upon the script concept developed by Dillenbourg and colleagues, we designed each tutorial as a parameterized sequence. A simple gradient descent method is used to progressively optimize the parameters in order to maximize the success rate of the students at the skill tests and therefore seek a form of optimality among the various design choices for the teaching methods. Thus, the more students use the platform, the more efficient teaching scripts become.

*

The references of references: a method to enrich humanities library catalogs with citation data

G. Colavizza; M. Romanello; F. Kaplan

International Journal on Digital Libraries. 2017.

DOI : 10.1007/s00799-017-0210-1.

The advent of large-scale citation indexes has greatly impacted the retrieval of scientific information in several domains of research. The humanities have largely remained outside of this shift, despite their increasing reliance on digital means for information seeking. Given that publications in the humanities have a longer than average life-span, mainly due to the importance of monographs for the field, this article proposes to use domain-specific reference monographs to bootstrap the enrichment of library catalogs with citation data. Reference monographs are works considered to be of particular importance in a research library setting, and likely to possess characteristic citation patterns. The article shows how to select a corpus of reference monographs, and proposes a pipeline to extract the network of publications they refer to. Results using a set of reference monographs in the domain of the history of Venice show that only 7% of extracted citations are made to publications already within the initial seed. Furthermore, the resulting citation network suggests the presence of a core set of works in the domain, cited more frequently than average.

*

Studying Linguistic Changes over 200 Years of Newspapers through Resilient Words Analysis

V. Buntinx; C. Bornet; F. Kaplan

Frontiers in Digital Humanities. 2017.

DOI : 10.3389/fdigh.2017.00002.

This paper presents a methodology to analyze linguistic changes in a given textual corpus allowing to overcome two common problems related to corpus linguistics studies. One of these issues is the monotonic increase of the corpus size with time, and the other one is the presence of noise in the textual data. In addition, our method allows to better target the linguistic evolution of the corpus, instead of other aspects like noise fluctuation or topics evolution. A corpus formed by two newspapers “La Gazette de Lausanne” and “Le Journal de Genève” is used, providing 4 million articles from 200 years of archives. We first perform some classical measurements on this corpus in order to provide indicators and visualizations of linguistic evolution. We then define the concept of a lexical kernel and word resilience, to face the two challenges of noises and corpus size fluctuations. This paper ends with a discussion based on the comparison of results from linguistic change analysis and concludes with possible future works continuing in that direction.

*

A View on Venetian Apprenticeship through the Garzoni Database

G. Colavizza

2017. Garzoni Apprendistato, Lavoro e Società a Venezia e in Europa, XVI-XVIII secolo , Venice, Italy , October 10-11, 2014. p. 235-260.

A sample of contracts of apprenticeship from three periods in the history of early modern Venice is analysed, as recorded in the archive of the Giustizia Vecchia, a venetian magistracy. The periods are the end of the 16th century, the 1620s and the 1650s. A set of findings is discussed. First, the variety of professions represented in the dataset reduces over time, as the proportion of venetian apprentices increases, in accordance with previous literature highlighting the decline of the venetian economy during the 17th century. Secondly, apprenticeships are found to be divided into two broad groups: those who stipulated a payment to be given by the master to the apprentice (circa 80%), and those who did not. The first group is suggested to represent contracts used in part, sometimes exclusively, to hire cheap workforce as well as to provide training. Lastly, professional profiles are introduced, as a combination of statistics which provide evidence of three typologies of professions with respect to apprenticeship market dynamics.

2016

*

From Documents to Structured Data: First Milestones of the Garzoni Project

M. Ehrmann; G. Colavizza; O. Topalov; R. Cella; D. Drago et al.

DHCommons. 2016.

Led by an interdisciplinary consortium, the Garzoni project undertakes the study of apprenticeship, work and society in early modern Venice by focusing on a specific archival source, namely the Accordi dei Garzoni from the Venetian State Archives. The project revolves around two main phases with, in the first instance, the design and the development of tools to extract and render information contained in the documents (according to Semantic Web standards) and, as a second step, the examination of such information. This paper outlines the main progress and achievements during the first year of the project.

*

Ancient administrative handwritten documents: virtual x-ray reading

F. Albertin; G. Margaritondo; F. Kaplan

2016.

Patent number(s) :
WO2015189817
WO2015189817

A method for detecting ink writings in a specimen comprising stacked pages, allowing a page-by-page reading without turning pages The method compris- es steps of taking a set of projection x-ray images for different positions of the specimen with respect to an x-ray source and a detector from an apparatus for taking projection x-ray images; storing the set of projection x-ray images in a suitable computer system; and processing the set of projection x-ray images to tomographically reconstruct the shape of the specimen.

*

Rendre le passé présent

F. Kaplan

Forum des 100, Université de Lausanne, Switzerland, Mai, 2016.

La conception d’un espace à quatre dimensions, dont la navigation agile, permet de réintroduire une continuité fluide entre le présent et le passé, s’inscrit dans l’ancien rêve philosophico-technologique de la machine à remonter le temps. Le moment historique auquel nous sommes convié s’inscrit comme la continuité d’un long processus ou fiction, technologie, science et culture se mêlent. La machine à remonter le temps est cet horizon toujours discuté, progressivement approché, et, aujourd’hui peut-être pour la première fois atteignable.

*

La modélisation du temps dans les Digital Humanities

F. Kaplan

Regimes temporels et sciences historiques, Bern, October, 14, 2016.

Les interfaces numériques sont chaque jour optimisées pour proposer des navigations sans frictions dans les multiples dimensions du présent. C’est cette fluidité, caractéristique de ce nouveau rapport à l’enregistrement documentaire, que les Digital Humanities pourraient réussir à reintroduire dans l’exploration du passé. Un simple bouton devrait nous permettre de glisser d’une représentation du présent à la représentation du même référent, il y a 10, 100 ou 1000 ans. Idéalement, les interfaces permettant la navigation dans le temps devraient pouvoir offrir la même agilité d’action que celle nous permettent de zoomer et des zoomer sur des objets aussi larges et denses que le globe terrestre. La recherche textuelle, nouvelle porte d’entrée de la connaissance depuis le le XXIe siècle devrait pouvoir s’étendre avec la même simplicité aux contenus des documents du passé. La recherche visuelle, second grand moment de l’indexation du monde et dont les premiers résultats commencent à s’inviter sur la quotidienneté de nos pratiques numériques, pourrait être la clé de voute de l’accès aux milliards de documents qu’il nous faut maintenant rendre accessible sous format numérique. Pour rendre le passé présent, il faudrait le restructurer selon les logiques des structures de la société numérique. Que deviendrait le temps dans cette transformation ? Une simple nouvelle dimension de l’espace ? La réponse est peut-être plus subtile.

*

L’Europe doit construire la première Time Machine

F. Kaplan

2016.

Le projet Time Machine, en compétition dans la course pour les nouveaux FET Flagships, propose une infrastructure d’archivage et de calcul unique pour structurer, analyser et modéliser les données du passé, les réaligner sur le présent et permettre de se projeter vers l’avenir. Il est soutenu par 70 institutions provenant de 20 pays et par 13 programmes internationaux.

*

Visual Link Retrieval in a Database of Paintings

B. L. A. Seguin; C. Striolo; I. di Lenardo; F. Kaplan

2016. VISART Workshop, ECCV , Amsterdam , September, 2016.

DOI : 10.1007/978-3-319-46604-0_52.

This paper examines how far state-of-the-art machine vision algorithms can be used to retrieve common visual patterns shared by series of paintings. The research of such visual patterns, central to Art History Research, is challenging because of the diversity of similarity criteria that could relevantly demonstrate genealogical links. We design a methodology and a tool to annotate efficiently clusters of similar paintings and test various algorithms in a retrieval task. We show that pretrained convolutional neural network can perform better for this task than other machine vision methods aimed at photograph analysis. We also show that retrieval performance can be significantly improved by fine-tuning a network specifically for this task.

*

Epidemics in Venice: On the Small or Large Nature of the Pre-modern World

G. Colavizza

2016. International Workshop on Computational History and Data-Driven Humanities , Dublin, Ireland , May 25, 2016. p. 33-40.

DOI : 10.1007/978-3-319-46224-0_4.

Marvel et al. [12] recently argued that the pre-modern contact world was physically and, by set inclusion, socially not small-world. Since the Black Death and similar plagues used to spread in well-defined waves, the argument goes, the underlying contact network could not have been small-world. I counter here that small-world contact networks were likely to exist in pre-modern times in a setting of the greatest importance for the outbreak of epidemics: urban environments. I show this by running epidemic diffusion simulations on the transportation network of Venice, verifying how such network becomes small-world when we account for naval transportation. Large epidemic outbreaks might not have been even possible without the catalyst of urban small-worlds.

*

Exploring Citation Networks to Study Intertextuality in Classics

M. Romanello

Digital Humanities Quarterly. 2016.

Referring constitutes such an essential scholarly activity across disciplines that it has been regarded by [Unsworth 2000] as one of the scholarly primitives. In Classics, in particular, the references to passages of the ancient texts - the so-called canonical citations (or references) - play a prominent role. The potential of these citations, however, has not been fully exploited to date, despite the attention that they have recently received in the field of Digital Humanities. In this paper I discuss two aspects of making such citations computable. Firstly, I illustrate how they can be extracted from text by using Natural Language Processing techniques, especially Named Entity Recognition. Secondly, I discuss the creation of a three-level citation network to formalise the web of relations between texts that canonical references implicitly constitute. As I outline in the conclusive section of this paper, the possible uses of the extracted citation network include the development of search applications and recommender systems for bibliography; the enhancement of digital environments to read primary sources with links to related secondary literature; and the application of these network to the study of intertextuality and text reception.

*

Clustering citation histories in the Physical Review

G. Colavizza; M. Franceschet

Journal of Informetrics. 2016.

DOI : 10.1016/j.joi.2016.07.009.

We investigate publications through their citation histories – the history events are the citations given to the article by younger publications and the time of the event is the date of publication of the citing article. We propose a methodology, based on spectral clustering, to group citation histories, and the corresponding publications, into communities and apply multinomial logistic regression to provide the revealed communities with semantics in terms of publication features. We study the case of publications from the full Physical Review archive, covering 120 years of physics in all its domains. We discover two clear archetypes of publications – marathoners and sprinters – that deviate from the average middle-of-the-roads behaviour, and discuss some publication features, like age of references and type of publication, that are correlated with the membership of a publication into a certain community.

*

Diachronic Evaluation of NER Systems on Old Newspapers

M. Ehrmann; G. Colavizza; Y. Rochat; F. Kaplan

2016. 13th Conference on Natural Language Processing (KONVENS 2016) , Bochum, Germany , September 19-21, 2016. p. 97-107.

In recent years, many cultural institutions have engaged in large-scale newspaper digitization projects and large amounts of historical texts are being acquired (via transcription or OCRization). Beyond document preservation, the next step consists in providing an enhanced access to the content of these digital resources. In this regard, the processing of units which act as referential anchors, namely named entities (NE), is of particular importance. Yet, the application of standard NE tools to historical texts faces several challenges and performances are often not as good as on contemporary documents. This paper investigates the performances of different NE recognition tools applied on old newspapers by conducting a diachronic evaluation over 7 time-series taken from the archives of Swiss newspaper Le Temps.

*

Wikipedia's Miracle

F. Kaplan; N. Nova

Lausanne: EPFL PRESS.

Wikipedia has become the principle gateway to knowledge on the web. The doubts about information quality and the rigor of its collective negotiation process during its first couple of years have proved unfounded. Whether this delights or horrifies us, Wikipedia has become part of our lives. Both flexible in its form and content, the online encyclopedia will continue to constitute one of the pillars of digital culture for decades to come. It is time to go beyond prejudices and to study its true nature and better understand the emergence of this “miracle.”

*

Le miracle Wikipédia

F. Kaplan; N. Nova

Lausanne: Presses Polytechniques et Universitaires Romandes.

Wikipédia s’est imposée comme la porte d’entrée principale de la connaissance sur le web. Les débats de ses premières années concernant la qualité des informations produites ou le bien-fondé du processus de négociation collective sont aujourd’hui dépassés. Que l’on s’en réjouisse ou qu’on le déplore, Wikipédia fait maintenant partie de notre vie. Flexible à la fois dans sa forme et dans ses contenus, l’encyclopédie en ligne continuera sans doute de constituer un des piliers de la culture numérique lors des prochaines décennies. Au-delà des préjugés, il s’agit maintenant d’étudier sa véritable nature et de comprendre à rebours comment un tel « miracle » a pu se produire.

*

La culture internet des mèmes

F. Kaplan; N. Nova

Lausanne: Presses Polytechniques et Universitaires Romandes.

Nous sommes à un moment de transition dans l’histoire des médias. Sur Internet, des millions de personnes produisent, altèrent et relaient des « mèmes », contenus numériques aux motifs stéréotypés. Cette « culture » offre un paysage nouveau, riche et complexe à étudier. Pour la première fois, un phénomène à la fois global et local, populaire et, d’une certaine manière, élitiste, construit, « médié » et structuré par la technique, peut être observé avec précision. Étudier les mèmes, c’est non seulement comprendre ce qu’est et sera peut-être la culture numérique, mais aussi inventer une nouvelle approche permettant de saisir la complexité des circulations de motifs à l’échelle mondiale.

*

Circulation of Opinions in Visualization Reading

D. Rodighiero

2016. International Working Conference on Advanced Visual Interfaces (AVI 2016) , Bari, Italy , June 7-10, 2016. p. 13-19.

Visualizations need interpretation as a way to grasp the meaning of visual representation. They are complex, and often the process of creation is hidden to the public. Because of this, the following text illustrates a way to read the visual representation of data by analysing the reading in three intervals: detail, visualization, and context. These three different moments make explicit the structure of reading, which will end with the interpretation—the moment in which the observer gets insights and becomes conscious about a personal kind of knowledge. Interpretation, which is composed of personal opinions, is a very important medium to keep the information circulating and to permit an open dialogue with other observers who are reading the same visualization, even in the medical field. In this paper the photography of Luigi Ghirri will illustrate the schematic approach; successively, the three intervals will be exemplified using a medical example, where my parents will be involved in the reading of a blood test. The simple idea is that, through the circulation of opinions and the dialogue, visualization interpretation will foster the creation of a common knowledge and improve the capacity of reading by each single observer.

*

Visual Patterns Discovery in Large Databases of Paintings

I. di Lenardo; B. L. A. Seguin; F. Kaplan

2016. Digital Humanities 2016 , Krakow, Polland , July 11-16, 2016.

The digitization of large databases of works of arts photographs opens new avenue for research in art history. For instance, collecting and analyzing painting representations beyond the relatively small number of commonly accessible works was previously extremely challenging. In the coming years,researchers are likely to have an easier access not only to representations of paintings from museums archives but also from private collections, fine arts auction houses, art historian However, the access to large online database is in itself not sufficient. There is a need for efficient search engines, capable of searching painting representations not only on the basis of textual metadata but also directly through visual queries. In this paper we explore how convolutional neural network descriptors can be used in combination with algebraic queries to express powerful search queries in the context of art history research.

*

Visualizing Complex Organizations with Data

D. Rodighiero

IC Research Day, Lausanne, Switzerland, June 30, 2016.

The Affinity Map is a project founded by the ENAC whose aim is to provide an instrument to understand organizations. The photograph shows the disclosure of the first map for the ENAC Research Day. The visualization was presented to scholars who are displayed in the representation itself.

*

Navigating through 200 years of historical newspapers

Y. Rochat; M. Ehrmann; V. Buntinx; C. Bornet; F. Kaplan

2016. iPRES 2016 , Bern , October 3-6, 2016.

This paper aims to describe and explain the processes behind the creation of a digital library composed of two Swiss newspapers, namely Gazette de Lausanne (1798-1998) and Journal de Genève (1826-1998), covering an almost two-century period. We developed a general purpose application giving access to this cultural heritage asset; a large variety of users (e.g. historians, journalists, linguists and the general public) can search through the content of around 4 million articles via an innovative interface. Moreover, users are offered different strategies to navigate through the collection: lexical and temporal lookup, n-gram viewer and named entities.

*

Cross-lingual Linking of Multi-word Entities and their corresponding Acronyms

G. Jacquet; M. Ehrmann; R. Steinberger; J. Väyrynen

2016. 10th International Conference on Language Resources and Evaluation , Portorož, Slovenia , May 2016.

This paper reports on an approach and experiments to automatically build a cross-lingual multi-word entity resource. Starting from a collection of millions of acronym/expansion pairs for 22 languages where expansion variants were grouped into monolingual clusters, we experiment with several aggregation strategies to link these clusters across languages. Aggregation strategies make use of string similarity distances and translation probabilities and they are based on vector space and graph representations. The accuracy of the approach is evaluated against Wikipedia's redirection and cross-lingual linking tables. The resulting multi-word entity resource contains 64,000 multi-word entities with unique identifiers and their 600,000 multilingual lexical variants. We intend to make this new resource publicly available.

*

Named Entity Resources - Overview and Outlook

M. Ehrmann; D. Nouvel; S. Rosset

2016. 10th International Conference on Language Resources and Evaluation , Portorož, Slovenia , May 2016.

Recognition of real-world entities is crucial for most NLP applications. Since its introduction some twenty years ago, named entity processing has undergone a significant evolution with, among others, the definition of new tasks (e.g. entity linking) and the emergence of new types of data (e.g. speech transcriptions, micro-blogging). These pose certainly new challenges which affect not only methods and algorithms but especially linguistic resources. Where do we stand with respect to named entity resources? This paper aims at providing a systematic overview of named entity resources, accounting for qualities such as multilingualism, dynamicity and interoperability, and to identify shortfalls in order to guide future developments.

*

JRC-Names: Multilingual Entity Name variants and titles as Linked Data

M. Ehrmann; G. Jacquet; R. Steinberger

Semantic Web. 2016.

DOI : 10.3233/SW-160228.

Since 2004 the European Commission's Joint Research Centre (JRC) has been analysing the online version of printed media in over twenty languages and has automatically recognised and compiled large amounts of named entities (persons and organisations) and their many name variants. The collected variants not only include standard spellings in various countries, languages and scripts, but also frequently found spelling mistakes or lesser used name forms, all occurring in real-life text (e.g. Benjamin/Binyamin/Bibi/Benyam'in/Biniamin/Беньямин/بنيامين Netanyahu/Netanjahu/N\'{e}tanyahou/Netahny/Нетаньяху/\نتنياهو). This entity name variant data, known as JRC-Names, has been available for public download since 2011. In this article, we report on our efforts to render JRC-Names as Linked Data (LD), using the lexicon model for ontologies lemon. Besides adhering to Semantic Web standards, this new release goes beyond the initial one in that it includes titles found next to the names, as well as date ranges when the titles and the name variants were found. It also establishes links towards existing datasets, such as DBpedia and Talk-Of-Europe. As multilingual linguistic linked dataset, JRC-Names can help bridge the gap between structured data and natural languages, thus supporting large-scale data integration, e.g. cross-lingual mapping, and web-based content processing, e.g. entity linking. JRC-Names is publicly available through the dataset catalogue of the European Union's Open Data Portal.

*

Self-Recognition in Data Visualization: how people see themselves in social visualizations

D. Rodighiero; L. Cellard

PubPub. 2016.

Self-recognition is an intimate act performed by people. Inspired by Paul Ricoeur, we reflect upon the action of self-recognition, especially when data visualization represents the observer itself. Along the article, the reader is invited to think about this specific relationship through concepts like the personal identity stored in information systems, the truthfulness at the core of self-recognition, and the mutual-recognition among community members. In the context of highly interdisciplinary research, we unveil two protagonists in data visualization: the designer and the observer - the designer as the creator and the observer as the viewer of a visualization. This article deals with some theoretical aspects behind data visualization, a discipline more complex than normally expected. We believe that data visualization deserves a conceptual framework, and this investigation pursues this intention. For this reason, we look at the designer as not just a technician in the visualization production, but as a contemporary ethnologist - the designer as a professional working in a social environment to comprehend the context and formulate a specific inquiry with the help of appropriate visual languages.

*

Reading Data Together

D. Rodighiero

2016. VVH 2016 - 1st International Workshop on "Valuable visualization of healthcare information": from the quantified self data to conversations. , Bari, Italy , June 7-10, 2016.

Network visualizations are the most complex visualizations possible, but sometimes they are not capable of describing system-complexity. Even if they are the most widely employed visualization techniques, they still have limitations. Indeed a) their relations are not sufficient to analyse complexity and b) networks do not distinguish between qualitative differences of represented entities. Starting from the actual network model, how could one manipulate this visualization to improve complexity comprehension? In this paper, we propose a solution called trajectory. The trajectory has two major points of difference compared to the network: the trajectory a) represents not only distances, but also durations, and b) it displays kinetic entities according to their evolution with time. The discourse is articulated around these four points. Considering that networks are tools widely used by digital humanists, we propose a new language to improve the quality of represented data: a new network based on a vertical timeline. Complexification of the network visualization is not just a new language, but also a tool that would give the field of Digital Humanities the most complex of all possible visualizations.

*

Studying Linguistic Changes on 200 Years of Newspapers

V. Buntinx; C. Bornet; F. Kaplan

2016. Digital Humanities 2016 , Kraków, Poland , July 11-16, 2016.

Large databases of scanned newspapers open new avenues for studying linguistic evolution. By studying a two-billion-word corpus corresponding to 200 years of newspapers, we compare several methods in order to assess how fast language is changing. After critically evaluating an initial set of methods for assessing textual distance between subsets corresponding to consecutive years, we introduce the notion of a lexical kernel, the set of unique words that maintain themselves over long periods of time. Focusing on linguistic stability instead of linguistic change allows building more robust measures to assess long term phenomena such as word resilience. By systematically comparing the results obtained on two subsets of the corpus corresponding to two independent newspapers, we argue that the results obtained are independent of the specificity of the chosen corpus, and are likely to be the results of more general linguistic phenomena.

*

A Method for Record Linkage with Sparse Historical Data

G. Colavizza; M. Ehrmann; Y. Rochat

2016. Digital Humanities Conference 2016 , Krakow, Poland , July 11-16, 2016.

Massive digitization of archival material, coupled with automatic document processing techniques and data visualisation tools offers great opportunities for reconstructing and exploring the past. Unprecedented wealth of historical data (e.g. names of persons, places, transaction records) can indeed be gathered through the transcription and annotation of digitized documents and thereby foster large-scale studies of past societies. Yet, the transformation of hand-written documents into well-represented, structured and connected data is not straightforward and requires several processing steps. In this regard, a key issue is entity record linkage, a process aiming at linking different mentions in texts which refer to the same entity. Also known as entity disambiguation, record linkage is essential in that it allows to identify genuine individuals, to aggregate multi-source information about single entities, and to reconstruct networks across documents and document series. In this paper we present an approach to automatically identify coreferential entity mentions of type Person in a data set derived from Venetian apprenticeship contracts from the early modern period (16th-18th c.). Taking advantage of a manually annotated sub-part of the document series, we compute distances between pairs of mentions, combining various similarity measures based on (sparse) context information and person attributes.

*

The References of References: Enriching Library Catalogs via Domain-Specific Reference Mining

G. Colavizza; M. Romanello; F. Kaplan

2016. 3rd International Workshop on Bibliometric-enhanced Information Retrieval (BIR2016) , Padua, Italy , March 20-23, 2016. p. 32-43.

The advent of large-scale citation services has greatly impacted the retrieval of scientific information for several domains of research. The Humanities have largely remained outside of this shift despite their increasing reliance on digital means for information seeking. Given that publications in the Humanities probably have a longer than average life-span, mainly due to the importance of monographs in the field, we propose to use domain-specific reference monographs to bootstrap the enrichment of library catalogs with citation data. We exemplify our approach using a corpus of reference monographs on the history of Venice and extracting the network of publications they refer to. Preliminary results show that on average only 7% of extracted references are made to publications already within such corpus, therefore suggesting that reference monographs are effective hubs for the retrieval of further resources within the domain.

*

Two lectures about representing scientific communities by data visualization

D. Rodighiero

Academic Training Lecture Regular Programme, CERN, Geneva, Switzerland, March 14-15, 2016.

<p>These lectures present a research that investigates the representation of communities, and the way to foster their understanding by different audiences. Communities are complex multidimensional entities intrinsically difficult to represent synthetically. The way to represent them is likely to differ depending on the audience considered: governing entities trying to make decision for the future of the community, general public trying to understand the nature of the community and the members of the community themselves. This work considers two types of communities as example: a scientific organization and an arising domain: the EPFL institutional community composed of faculty members and researchers and, at a world wide level, the emerging community of Digital Humanities researchers. For both cases, the research is organised as a process going from graphical research to actual materialization as physical artefacts (posters, maps, etc.), possibly extended using digital devices (augmented reality applications). Through iterative cycles of design and experimentation, the research explores theoretically (representation theory, studies about networks, cartography, etc.) and experimentally (development of methods to assess the relevance of each representation depending on the target audiences) how to create effective community mapping. Its global ambition is to inform a theory of design helping to understand how certain community representations can lead to actual cognitive shifts in the way a community is understood.</p> <p>First Day - Design Creation</p> <p>The lecture proposes a new way to look at scientific communities. Dealing with a very complex situation, where literacy production is enormous and decisions are made using metrics that are judged obsolete by all, we propose a visual way to understand the community organization. How do scholars work together? What is the intermediary object which makes scientific researchers work together? This first session transforms the current situation into a visual object, a design artefact that embodies the elemental in the creation of maps to understand and evaluate scientific communities.</p> <p>Second Day - Use of the Maps</p> <p>The lecture proposes the use of maps to understand and evaluate scientific communities. As continuation of yesterday's lecture, the topic of the day is how to present elementary objects—which represents publications, teaching, grants and subjects of matters—in a map. Several maps will be shown, representing a precise scientific community inside the EPFL, but with the perspective to make them adaptable to other communities. Moreover, much attention will be dedicated to the reading and interpretation of these maps. Finally a web-based software will be introduced, to illustrate to members and managers of any given community the benefit of a visual representation of a scientific organisation.</p>

*

La Ville Analogue, la Carte

D. Rodighiero

Aldo Rossi, La Finestra del Poeta, EPFL, Lausanne, Switzerland, February 29, 2016.

This new publication of The Analogous City, an artwork produced by Aldo Rossi, Eraldo Consolascio, Bruno Reichlin and Fabio Reinhart for the Venice Biennale of Architecture in 1976, is part of a museographic installation for the exhibition Aldo Rossi - The Window of the Poet at the Bonnefanten Museum in Maastricht. To gauge and explore this seminal work, Archizoom relied on Dario Rodighiero, candidate on the Doctoral Programme for Architecture and Sciences of the Cities, and designer at the Digital Humanities Lab (DHLAB) at EPFL. Conceived as a genuine urban project, The Analogous City displays an aggregation of architectures drawn from collective and personal memories. What happens if we isolate the forms that Aldo Rossi and his friends so consciously placed in relation to each other? Rodighiero simply decomposed it into the original references and then returned the pieces to the artwork, thus allowing us to simultaneously see the work and its visual vocabulary. An application based on augmented reality has been created to work in tandem with this publication by displaying the complete references belonging to the collage on different layers suspended over the artwork. By downloading the free application and installing it on your tablet or mobile phone, you can recreate the interaction of the museum installation whenever and wherever you are.

*

The Trajectories Tool: Amplifying Network Visualization Complexity

A. Rigal; D. Rodighiero; L. Cellard

2016. Digital Humanities 2016 , Krakóv, Poland , 12-16 July 2016.

Network visualizations are the most complex visualizations possible, but sometimes they are not capable of describing system-complexity. Even if they are the most widely employed visualization techniques, they still have limitations. Indeed a) their relations are not sufficient to analyse complexity and b) networks do not distinguish between qualitative differences of represented entities. Starting from the actual network model, how could one manipulate this visualization to improve complexity comprehension? In this paper, we propose a solution called t rajectory. The trajectory has two major points of difference compared to the network: the trajectory a) represents not only distances, but also durations, and b) it displays kinetic entities according to their evolution with time. The discourse is articulated around these four points. Considering that networks are tools widely used by digital humanists, we propose a new language to improve the quality of represented data: a new network based on a vertical timeline. Complexification of the network visualization is not just a new language, but also a tool that would give the field of Digital Humanities the most complex of all possible visualizations.

2015

*

S'affranchir des automatismes

B. Stiegler; F. Kaplan; D. Podalydès

Fabuleuses mutations, Cité des Sciences, December 8, 2015.

*

Les entités nommées pour le traitement automatique des langues

D. Nouvel; M. Ehrmann; S. Rosset

ISTE editions.

Le monde numérisé et connecté produit de grandes quantités de données. Analyser automatiquement le langage naturel est un enjeu majeur pour les applications de recherches sur le Web, de suivi d'actualités, de fouille, de veille, d'opinion, etc. Les recherches menées en extraction d'information ont montré l'importance de certaines unités, telles que les noms de personnes, de lieux et d’organisations, les dates ou les montants. Le traitement de ces éléments, les « entités nommées », a donné lieu au développement d'algorithmes et de ressources utilisées par les systèmes informatiques. Théorique et pratique, cet ouvrage propose des outils pour définir ces entités, les identifier, les lier à des bases de connaissance ou pour procéder à l’évaluation des systèmes.

*

Venezia e l’invenzione del paesaggio urbano tra laguna e città

I. di Lenardo

Acqua e Cibo. Storie di Laguna e Città; Marsilio, 2015. p. 35-39.

*

The Venice Time Machine

F. Kaplan

2015. ACM Symposium on Document Engineering , Lausanne, Switzerland , September 08 - 11, 2015.

The Venice Time Machine is an international scientific programme launched by the EPFL and the University Ca’Foscari of Venice with the generous support of the Fondation Lombard Odier. It aims at building a multidimensional model of Venice and its evolution covering a period of more than 1000 years. The project ambitions to reconstruct a large open access database that could be used for research and education. Thanks to a parternship with the Archivio di Stato in Venice, kilometers of archives are currently digitized, transcribed and indexed setting the base of the largest database ever created on Venetian documents. The State Archives of Venice contain a massive amount of hand-written documentation in languages evolving from medieval times to the 20th century. An estimated 80 km of shelves are filled with over a thousand years of administrative documents, from birth registrations, death certificates and tax statements, all the way to maps and urban planning designs. These documents are often very delicate and are occasionally in a fragile state of conservation. In complementary to these primary sources, the content of thousands of monographies have been indexed and made searchable.

*

Venice Time Machine : Recreating the density of the past

I. di Lenardo; F. Kaplan

2015. Digital Humanities 2015 , Sydney , June 29 - July 3, 2015.

This article discusses the methodology used in the Venice Time Machine project (http://vtm.epfl.ch) to reconstruct a historical geographical information system covering the social and urban evolution of Venice over a period of 1,000 years. Given the time span considered, the project used a combination of sources and a specific approach to align heterogeneous historical evidence into a single geographic database. The project is based on a mass digitization project of one of the largest archives in Venice, the Archivio di Stato. One goal of the project is to build a kind of ‘Google map’ of the past, presenting a hypothetical reconstruction of Venice in 2D and 3D for any year starting from the origins of the city to present-day Venice.

*

Trajectoire d’une représentation cartographique en réseau

A. Rigal; D. Rodighiero

Cartes & Géomatique: Temps, Art & Cartographie. 2015.

Dans le cadre de la « Conférence internationale des Humanités Numériques » qui s’est tenue en 2014 à Lausanne, nous avons réalisé une représentation. Celle-ci est une cartographie en réseau des auteurs et des mots-clés de la conférence. Les cartes résultantes sont reproduites sur divers objets : bâche, tapis, ouvrages, posters, tasses. Ces derniers avaient pour fonction de susciter l'intérêt des auteurs et leur identi cation au champ des humanités numériques. La qualité de la car- tographie en réseau est qu'elle exclut peu d'acteurs et dans notre cas peu de participants. De ce fait un grand nombre de participants à la conférence a pu se trouver sur la représentation et par là prendre part au collectif suggéré par les liens de la cartographie. Par ces reproductions, qui ne sont jamais vraiment mécaniques, la représentation a circulé en alimentant des interprétations qui tracent les contours d'un collectif propre à la conférence. Les traces fabriquées par les participants - commentaires de la cartographie, photos, souvenirs, tweets, etc. -, permettent de suivre la trajectoire de la représentation. Par conséquent, savoir si la représentation a réussi revient à enquêter sur l’étendue et la qualité de sa trajectoire entre les épreuves. L’enjeu de cet article est donc d’enquêter sur le design cartographique en tant qu’art du rassemblement, grâce aux outils du design cartographique.

*

On Mining Citations to Primary and Secondary Sources in Historiography

G. Colavizza; F. Kaplan

2015. Clic-IT 2015 , Trento, Italy , December 3-4, 2015.

We present preliminary results from the Linked Books project, which aims at analysing citations from the histo- riography on Venice. A preliminary goal is to extract and parse citations from any location in the text, especially footnotes, both to primary and secondary sources. We detail a pipeline for these tasks based on a set of classifiers, and test it on the Archivio Veneto, a journal in the domain.

*

Character network analysis of Émile Zola’s Les Rougon-Macquart

Y. Rochat

2015. Digital Humanities 2015 , Sydney , June 29 - July 3, 2015.

In this work, we use network analysis methods to sketch a typology of fiction novels based on characters and their proximity in the narration. We construct character networks modelling the twenty novels composing Les Rougon-Macquart, written by Émile Zola. To categorise them, we rely on methods that track down major and minor characters relative to the character-systems. For that matter, we use centrality measures such as degree and eigenvector centrality. Eventually, with this analysis of a small corpus, we open the stage for a large-scale analysis of novels through their character networks.

*

Text Line Detection and Transcription Alignment: A Case Study on the Statuti del Doge Tiepolo

F. Slimane; A. Mazzei; L. Tomasin; F. Kaplan

2015. Digital humanities , Sydney, Australia , June 29 - July 3, 2015.

In this paper, we propose a fully automatic system for the transcription alignment of historical documents. We introduce the ‘Statuti del Doge Tiepolo’ data that include images as well as transcription from the 14th century written in Gothic script. Our transcription alignment system is based on forced alignment technique and character Hidden Markov Models and is able to efficiently align complete document pages.

*

Anatomy of a Drop-Off Reading Curve

C. Bornet; F. Kaplan

2015. DH2015 , Sydney, Australia , June 29 - July 3, 2015.

Not all readers finish the book they start to read. Electronic media allow to us to measure more precisely how this “drop-off” effect unfolds as readers are reading a book. A curve showing how many people have read each chapter of a book is likely to be progressively going down as part of them interrupt their reading “journey”. This article is an initial study about the shape of these “drop­off” reading curves.

*

Inversed N-gram viewer: Searching the space of word temporal profiles

V. Buntinx; F. Kaplan

2015. Digital Humanities 2015 , Sydney, Australia , 29 June–3 July 2015.

*

The DHLAB Trajectory

D. Rodighiero; A. Rigal; L. Cellard

IC Research Day 2015, EPFL, SwissTech Convention Center, EPFL, Lausanne, Switzerland, 30-6, 2015.

This visualisation represents the research activity of the Digital Humanities Lab through publications and co-authorship. Vertical disposition is ordered by time: each layer is a different year of publications, from the lab’s foundation to nowadays. The layers displays the collaboration networks: two researchers are linked if they published together. The vertical trajectories represent the activity of a researcher through the time. The authors position is fix in the space. As consequence, the trajectories become a linear representation of collaborations continuity. The laboratory is here transformed in a geometrical structure which evolves in time despite the members instability.

*

The Analogous City, The Map

D. Rodighiero

Lausanne: Archizoom.

This new publication of The Analogous City, an artwork produced by Aldo Rossi, Eraldo Consolascio, Bruno Reichlin and Fabio Reinhart for the Venice Biennale of Architecture in 1976, is part of a museographic installation for the exhibition Aldo Rossi - The Window of the Poet at the Bonnefanten Museum in Maastricht. To gauge and explore this seminal work, Archizoom relied on Dario Rodighiero, candidate on the Doctoral Programme for Architecture and Sciences of the Cities, and designer at the Digital Humanities Lab (DHLAB) at EPFL. Conceived as a genuine urban project, The Analogous City displays an aggregation of architectures drawn from collective and personal memories. What happens if we isolate the forms that Aldo Rossi and his friends so consciously placed in relation to each other? Rodighiero simply decomposed it into the original references and then returned the pieces to the artwork, thus allowing us to simultaneously see the work and its visual vocabulary. An application based on augmented reality has been created to work in tandem with this publication by displaying the complete references belonging to the collage on different layers suspended over the artwork. By downloading the free application and installing it on your tablet or mobile phone, you can recreate the interaction of the museum installation whenever and wherever you are.

*

Representing the Digital Humanities Community: Unveiling The Social Network Visualization of an International Conference

D. Rodighiero

Parsons Journal of Information Mapping. 2015.

This paper deals with the sense of represent- ing both a new domain as Digital Humanities and its community. Based on a case study, where a set of visualizations was used to represent the community attending the international Digital Humanities conference of 2014 in Lausanne, Switzerland, the meaning of representing a community is investigated in the light of the theories of three acknowledged authors, namely Charles Sanders Peirce for his notion of the interpretant, Ludwig Wittgenstein for his insights on the use of language, and finally Bruno Latour for his ideas of representing politics. There results a proposal to designing and interpreting social network visualizations in a more thoughtful way, while remaining aware of the relation between objects in the real world and their visualizations. As this type of work pertains to a wider scope, we propose bringing a theoretical framework to a young domain such as data visualization.

*

Quelques réflexions préliminaires sur la Venice Time Machine

F. Kaplan

L'archive dans quinze ans; Louvain-la-Neuve: Academia, 2015. p. 161--179.

Encore aujourd’hui la plupart des historiens ont l’habitude de travailler en toutes petites équipes, se focalisant sur des problématiques très spécifiques. Ils n’échangent que très rarement leurs notes ou leurs données, percevant à tort ou à raison que leurs travaux de recherche préparatoire sont à la base de l’originalité de leurs travaux futurs. Prendre conscience de la dimension et la densité informationnelle des archives comme celle de Venise doit nous faire réaliser de l’impossibilité pour quelques historiens, travaillant de manière non coordonnée de couvrir avec une quelconque systématicité un objet aussi vaste. Si nous voulons tenter de transformer une archive de 80 kilomètres couvrant mille ans d’histoire en un système d’information structuré il nous faut développer un programme scientifique collaboratif, coordonné et massif. Nous sommes devant une entité informationnelle trop grande. Seule une collaboration scientifique internationale peut tenter d’en venir à bout.

*

A Map for Big Data Research in Digital Humanities

F. Kaplan

Frontiers in Digital Humanities. 2015.

DOI : 10.3389/fdigh.2015.00001.

This article is an attempt to represent Big Data research in digital humanities as a structured research field. A division in three concentric areas of study is presented. Challenges in the first circle – focusing on the processing and interpretations of large cultural datasets – can be organized linearly following the data processing pipeline. Challenges in the second circle – concerning digital culture at large – can be structured around the different relations linking massive datasets, large communities, collective discourses, global actors, and the software medium. Challenges in the third circle – dealing with the experience of big data – can be described within a continuous space of possible interfaces organized around three poles: immersion, abstraction, and language. By identifying research challenges in all these domains, the article illustrates how this initial cartography could be helpful to organize the exploration of the various dimensions of Big Data Digital Humanities research.

*

Mapping the Early Modern News Flow: An Enquiry by Robust Text Reuse Detection

G. Colavizza; M. Infelise; F. Kaplan

2015. HistoInformatics 2014 . p. 244-253.

DOI : 10.1007/978-3-319-15168-7_31.

Early modern printed gazettes relied on a system of news exchange and text reuse largely based on handwritten sources. The reconstruction of this information exchange system is possible by detecting reused texts. We present a method to individuate text borrowings within noisy OCRed texts from printed gazettes based on string kernels and local text alignment. We apply our methods on a corpus of Italian gazettes for the year 1648. Beside unveiling substantial overlaps in news sources, we are able to assess the editorial policy of different gazettes and account for a multi-faceted system of text reuse.

*

X-ray spectrometry and imaging for ancient administrative handwritten documents

F. Albertin; M. Stampanoni; E. Peccenini; Y. Hwu; F. Kaplan et al.

X-Ray Spectrometry. 2015.

DOI : 10.1002/xrs.2581.

‘Venice Time Machine’ is an international program whose objective is transforming the ‘Archivio di Stato’ – 80 km of archival records documenting every aspect of 1000 years of Venetian history – into an open-access digital information bank. Our study is part of this project: We are exploring new, faster, and safer ways to digitalize manuscripts, without opening them, using X-ray tomography. A fundamental issue is the chemistry of the inks used for administrative documents: Contrary to pieces of high artistic or historical value, for such items, the composition is scarcely documented. We used X-ray fluorescence to investigate the inks of four Italian ordinary handwritten documents from the 15th to the 17th century. The results were correlated to X-ray images acquired with different techniques. In most cases, iron detected in the ‘iron gall’ inks produces image absorption contrast suitable for tomography reconstruction, allowing computer extraction of handwriting information from sets of projections. When absorption is too low, differential phase contrast imaging can reveal the characters from the substrate morphology