Notes de méthodologie en bibliométrie
Délimitation du champ disciplinaire et de la granulométrie
If a precise delineation of fields is required, coarse-grained approaches usually fail to reach an acceptable trade-off in terms of recall-precision: Recalling more journals for inclusion of relevant articles is paid for by an excessive number of irrelevant ones. The alternative is a fine-grained delineation at the level of individual articles. Producers of indicators often rely on classical information retrieval processes based on lexical queries on titles, abstracts, or full texts. Recent advances in natural language processing techniques boost the potential of lexical methods. However, bibliometrics can also rely on kindred networks linked to self-representation of scientific communities, especially citation relations pioneered at ISI and Drexel University in the 1960s and 1970s (Garfield, 1970; Small & Griffith, 1974), or institutional arrangements. Citation relations in many forms (transactional, cocitation, bibliographic coupling, etc.) are particularly powerful techniques.
In tricky emerging areas, citation may perform well, whereas lexical methods get trapped, but the reverse may happen. Each network conveys a particular rationale and, of course, the outcome is also sensitive to the statistical methods, choice of metric, and so on. How to build on the complementarity of techniques, together with expert or peer validation protocols, is a challenge for scientometrics. (
Zitt, Michel,
Facing Diversity of Science: A Challenge for Bibliometric Indicators, in
3 Measurement,
1,
38--49 (
2005),
http://www.obs-ost.fr/doc_attach/FacingDiversityOfScience.pdf 
)
Relevance of bibliometric indicators on scientific areas critically depends on the quality of their delineation. Macro-level studies, often based on a selected list of journals, accept a high degree of fuzziness. Micro-level studies rely on sets of individual articles in order to reduce noise and enhance precision of retrieval. The most usual information retrieval process is based on lexical queries with various levels of sophistication. In the experiment on Nanosciences reported here, this process was used as a first step, to delineate a 'seed' of literature. It has strong limitations, especially for emerging or transversal fields. In a second step, the alternative approach of citation linkages, was used to expand the bibliography starting from lexical seed. The extension process presented is ruled by three parameters, two deal with the cited side (threshold on citation score, and specificity towards the field), one with the citing side (threshold on the number of relevant references) interplaying in the 'referencing structure' function (RSF) introduced in a previous work. This type of combination proves effective for delineating the transversal field of Nanosciences. Further improvements of the method are discussed. (
Zitt, Michel & Bassecoulard, Elise,
Delineating complex scientific fields by an hybrid lexical-citation method: An application to nanosciences, in
42 Information Processing & Management,
6,
1513-1531 (
2006),
http://www.sciencedirect.com/science/article/B6VC8-4K42DTF-1/2/63e170974f277bb3cbb0d4c7877f56f7 
)