Notes en vrac sur la patinformatics
La patinformatics
Historique
H. Small et E. Garfield affirment que Price fut le premier au cours des années 1960 à énoncer l'idée que la science pouvait être cartographiée (
mapped) [
H. Small et E. Garfield, "The geography of science: disciplinary and national mappings, Journal of Information Science, vol. 11, 1985, p. 147-159. Je cite ici sa reproduction dans le Science Citation Index de 1988, édité par l'Institute for Scientific Information de Philadelphia (USA), p. 46-58. Selon ces auteurs : "The notion that science can be mapped was first clearly stated by Derek Price during the 1960s" (p. 46). Ils renvoient à l'article de Price intitulé "The Science of Scientists" publié dans Medical Opinion and Review, vol. 1, nº 10, 1966, p. 88-97]. Une esquisse de cette idée se trouve déjà dans son essai "The Science of Science" (1964), là où il suggère que le pattern du réseau d'articles scientifiques pouvait être étudié par la théorie de graphes et par les méthodes des matrices, et il propose l'image selon laquelle "les articles se groupent dans des continents et dans des états qui peuvent eux aussi être cartographiés (
mapped)".
Le projet devient explicite dans la dernière partie de son article "Network of Scientific Papers" (1965). C'est en fonction de l'analyse de citations qu'il envisage la possibilité de faire une carte topographique de la littérature scientifique en cours, introduisant du même coup la notion de centralité (
centralness). Il voit qu'il serait alors possible d'indiquer "les recouvrements et l'importance relative des périodiques, et aussi des pays et des auteurs, ou des articles individuels, par la place qu'ils occupent dans la carte, et par leur degré de centralité stratégique" [
Price, "Network of Scientific Papers", loc. cit., p. 515].
On voit donc que l'idée d'une carte de la littérature scientifique a pour base, chez Price, les "relations structurelles du réseau de références et citations" [
Price, "The Citation Cycle"]. Le projet ne sera développé que dans la première moitié des années 1970, à l'Institut of Scientific Information (ISI) de Philadelphie, par Small et ses collaborateurs utilisant la méthode de co-citations, la technique du simple lien pour la constitution des
clusters de co-citations, et la méthode du multidimensional scaling pour la construction des cartes [Voir
H.G. Small et B.C. Griffith, "The Structure of Scientific Literature I: Identifying and Graphing Specialties", Science Studies, vol. 4, 1974, p. 17-40].
Dans la construction de cartes, comme notent Garfield, Malin et Small une possibilité est de placer les éléments dans un espace métrique où leur distance soit signifiante et bien définie. En réalité, ils adoptent une autre orientation : construire des cartes de la science avec des données ordinales et à l'aide de l'analyse multidimensionnelle (
multidimensional scaling). C'est l'option qui a été retenue en France, avec quelques particularités, par l'école des mots associés dans l'élaboration des cartes stratégiques. (
Polanco, Xavier,
Aux sources de la scientométrie, in
2 Solaris (
1995),
http://biblio-fr.info.unicaen.fr/bnum/jelec/Solaris/d02/2polanco1.html )
One day later [June 1960], Allen sent his set of biochemistry articles together with a drawing of the citation relationships between them. Gordon Allen had, in other words, drawn the first citation network: "The arrows indicate the direction in which one would be led in a conventional literature search, starting at any point on the network. A citation index would permit one to trace the arrows in the opposite direction, and hence to find all the articles no matter where on the network he started." He emphasized that this small network was an extract from “a considerably more voluminous literature on the same topic, all tied together with citations”. Garfield reacted strongly: "The material you sent me is magnificent! This must have been a great deal of work. It is fabulous. Why didn’t we think to do this before. I didn’t have this in mind when I said I had some examples of the power of the Citation Index. I merely meant specific articles which could be traced through a CI. (...) I once had the idea that some type of network theory could be used with Citation Indexes. I am now convinced more than ever, from your example, that this will be true." (
Wouters, Paul,
The Citation Culture,
Universiteit van Amsterdam,
mar,
1999,
http://garfield.library.upenn.edu/wouters/wouters.pdf p. 51 + fig. 2.2 p. 54)
John Desmond Bernal, who had been informed by Garfield shortly before Price but had not participated in the project, reviewed the SCI in spirited terms:" The value of the Science Citation Index was immediately apparent to me because I had tried to do the same thing in reverse order in writing about various aspects of the history of science. (...) Its essential value is, as claimed to be, that it is a new dimension in indices which should enable the polydimensional graph on the progress of science to be mapped out for the first time. Such a graph is a necessary stage in drawing up or planning any strategy for scientific research as a whole." (Bernal 1965) (Wouters, Paul, The Citation Culture, op. cit. p. 81)
Inspired by the concept of bibliographic coupling in the field of library science, the co-citation frequency was almost simultaneously and independently created by the American Henry Small (Small 1973) and the Russian Irina Marshakova (Marshakova 1973). This indicator records the number of times two publications are cited together, and is taken to be a measure of similarity of the two publications. Because one can measure co-citations at several levels of aggregation, the co-citation frequency can be used as a building block of scientific cartography: the mapping of science on the basis of co-citation links between publications (Starchild et al. 1981, Garfield et al. 1984). (Wouters, Paul, The Citation Culture, op. cit. p. 110)
The whole point of co-citation clustering is its capacity to create maps of science that can be interpreted by scientists, science managers or science policy officals. This does not mean that co-citation clustering straightforwardly reflects the true nature of science. The reality of co-citation clusters is, on the contrary, the very consequence of built-in inconsistencies and “ontological gerrymandering” (Woolgar 1991). (Wouters, Paul, The Citation Culture, op. cit. p. 116)
In 2003, a tech mining breakthrough occurred. (Porter, Alan L. and Cunningham, Scott W., Tech mining: Exploiting New Technologies for Competitive Advantage, John Wiley & Sons, 2005 p. 302) (cf. PIUG conference à Chicago !)
And we offer insights into how firms can mine the information contained in patents —the greatest source of competitive intelligence on earth— to map technology trends and convergences, uncover the strategies and capabilities of friend and foe alike, and strengthen the efforts of every functional unit in the enterprise, from R&D and marketing to finance, human resources, and mergers and acquisitions. (Rivette, Kevin G. and Kline, David, Rembrandts in the Attic: Unlocking the Hidden Value of Patents, Harvard Business School Press, Boston, MA, 2000 p. 29)
Principe
Instead of trying to identify a single grain of sand on a vast beach, business decision-makers more and more ask information professionals to identify trends and provide general overviews to put information in context when compared to a much larger collection of materials. Instead of finding a needle in a haystack, today's searchers are becoming analysts and being asked to identify haystacks from space and then forecast whether the haystack is the beginning of a new field or the remainder from last year's harvest.
Traditional patent searching deals with the mcro level, in which very small changes become extremely important and details and precision are imperatives. Patinformatics, by comparison, deals with thousands or tens of thousands of documents and, since small details will not be seen across such a vast landscape, takes a more macroscopic view of data, using different methods and reaching different conclusions. (Trippe, Anthony J., Patinformatics: Identifiying Haystacks from Space, in 10 Searcher, 9, 28 (2002))
Rather than just classifying documents, clustering techniques can yield valuable insight into the relationships existing between different categories (or clusters) of documents, thus a clustering approach to text mining is considered more effective in a business environment, especially where patent information is regarded not only as a support for legal issues, but also as an important player within the competitive intelligence function.
[…] text mining (or clustering) should allow for the extraction of information regarding
patenting trends in a more efficient way when compared to the capabilities of the standard Boolean tools which, on the other hand, allow for more precise retrieval of
patent documents and other bibliographic information regarding patents (such as legal status and patent families). (
Fattori, Michele & Pedrazzi, Giorgio & Turra, Roberta,
Text mining applied to patent mapping: a practical business case, in
25 World Patent Information,
335-342 (
2003),
http://dx.doi.org/10.1016/S0172-2190(03)00113-3 )
Tech mining transforms textual data -thousands or even millions of chunks of data- into actionable knowledge about emerging technologies. (Porter, Alan L. and Cunningham, Scott W., Tech mining: Exploiting New Technologies for Competitive Advantage, op. cit. p. 74)
Clearly, it is impossible to characterize the properties of this avalanche of knowledge by any techniques other than statistical. There are simply far too many papers and patents and far too much new knowledge in thousands of different research domains for any person, group, or even institution to comprehend -it can only be measured statistically. (
Narin, Francis & Olivastro, Dominic,
Patent citation cycles, in
41 Library Trends,
4 (
1993),
http://www.findarticles.com/p/articles/mi_m1387/is_n4_v41/ai_13213269 )
One key element that has come up in my discussions several times, is the fact that visualisation should be about providing a view which is not easily available otherwise - this brings in the ideas around clustering and categorisation that, something that is often a challenge to achieve "manually" on the volumes of information generated in these areas. If this new view on the information feeds into a better-informed decision-making process, this will be a key "selling point" for the product. (Eldridge, Jeanette, Data visualisation tools--a perspective from the pharmaceutical industry, in 28 World Patent Information, 1, 43--49 (2006))
The approach is not unlike the one used by molecular biologists to identify the function of a new protein sequence. The theory is that if two sequences share a high degree of similarity or homogeneity betwenn them, then the likelihood is high that the two proteins perform a similar function and are related. In a similar fashion, if two patent records have a high degree of homogeneity in their intellectually assigned codes, then the two documents are probably related. (
Trippe, Anthony J.,
A comparison of ideologies: intellectually assigned co-coding clustering vs ThemeScape automatic themematic mapping,
Proceedings of the 2001 Chemical Information Conference,
Nîmes, France,
2001,
http://homepage.mac.com/atrippe/B1336282893/C579890539/E1535689049/Media/A%20Comparison%20of%20Ideol )
Outils, méthodes
Use of any analysis tool typically starts from one of two points. In the first case, one has questions to answer. Here the analysis includes gathering of relevant data and the attempt to find meaningful answers from that data. In the second case, one has data to analyze and understand. This is more often the case with experiments from which data are gathered. (
Boyack, Kevin W. and Wylie, Brian N. and Davidson, George S.,
Domain Visualization Using VxInsight for Science and Technology Management, in
53 Journal of the American Society for Information Science,
9,
764-774 (
2002),
http://www.cs.sandia.gov/projects/VxInsight/pubs/jasist02_prepub.pdf )
To enable a better view of the patenting dynamics, we adopted the well-known technique of further splitting the time range into smaller segments or slices. (Fattori, Michele & Pedrazzi, Giorgio & Turra, Roberta, Text mining applied to patent mapping: a practical business case, op. cit.)
It is important to let the data speak for itself, as opposed to having the analysis directed by the searcher's preconceived notions while building the dataset. (Trippe, Anthony J., Patinformatics: Identifiying Haystacks from Space, op. cit.)
The process has a great deal to do with having a clear understanding of the business objective and desired use of the intelligence produced by the analysis. It is less a judgment call based on the analyst's understanding of the subject matter, as it is an experiment with conclusions drawn based on the results. (Trippe, Anthony J., Patinformatics: Identifiying Haystacks from Space, op. cit.)
In patinformatics it is absolutely essential that the business need for intelligence is clearly understood before anything else begins. It is also critical to know all of the needs behind the need as well. Analysts need to understand how the data will be used and who will use it. (Trippe, Anthony J., Patinformatics: Identifiying Haystacks from Space, op. cit.)
If an organisation focuses on a single analysis tool, then all subsequent analysis may be overshadowed by the strengths and weaknesses of that particular tool. As the old saying goes, "If all you have is a hammer, everything looks like a nail." (Trippe, Anthony J., Patinformatics: Identifiying Haystacks from Space, op. cit.)
This idea dictates that intelligence is only useful if it is applied to a business question and more importantly used to make a business decision. Analysis work should not be done fot its own sake. (Trippe, Anthony J., Patinformatics: Identifiying Haystacks from Space, op. cit.)
[…] many common automatic classification algorithms adopt the so-called "bag of words" approach, where classification decisions are made solely on the basis of statistical occurrences of words in certain classes without considering the semantic context of each word or the syntax of the text. (Richter, Georg & MacFarlane, Andrew, The impact of metadata on the accuracy of automated patent classification, 2004)
The most common term weight function in the field of information retrieval and automated text classification is undoubtedly
term frequency/inverse document frequency (tfidf), which
was developed by Salton and Buckley . (
Richter, Georg & MacFarlane, Andrew,
The impact of metadata on the accuracy of automated patent classification,
op. cit.)
Full-text, as contrasted to abstract, sources are increasingly available, and obviously highly valuable. But for our detective work, the abstract databases provide a better starting point because they concentrate tremendous quantities of raw informaton into well-structured records. They are ideal for tech mining. (Porter, Alan L. and Cunningham, Scott W., Tech mining: Exploiting New Technologies for Competitive Advantage, op. cit. p. 74 + encadré p. 76)
Because of the extent of American invention and the appeal of the U.S. market, if you were to pick one resource for measuring patent activity, [the USPTO database] would be a good one. (Porter, Alan L. and Cunningham, Scott W., Tech mining: Exploiting New Technologies for Competitive Advantage, op. cit. p. 74)
Clustering
"Mapping" désigne la représentation graphique de données groupées : cela peut-être des analyses dimensionnelles, du clustering ou des arbres (
Porter, Alan L. and Cunningham, Scott W.,
Tech mining: Exploiting New Technologies for Competitive Advantage,
op. cit. p.35 + pp. 174-178 + table 10-4)
In the physical world, maps help us to understand our environment — where we are, what is around us, and the relationships between neighboring things. By knowing about our surroundings, we are given more information by which to anticipate changes, especially those initiated in our immediate vicinity. Maps also provide a physical (geographical) structure for comparisons of metrics, such as census figures, vote tabulations, or average temperatures. Plus, maps help us navigate the landscape. (Boyack, Kevin W. and Klavans, Richard & Börner, Katy, Mapping the backbone of science, in 64 Scientometrics, 3, 351-374 (2005))
Generally speaking, a good clustering is expected to feature high values for its within and quality criteria and a low value for its between criterion. There are some caveats, for instance: a very high "within" value could mean a cluster map consisting of very small clusters (in theory, so small as to comprise just one document), and a very low "between" could well mean that all the clusters in the map are completely disconnected. In the first case, the cluster map is obviously of no value whatsoever, while in the second case the map lacks completely what possibly is the most remarkable value added information that could be obtained through clustering analysis: the indication of relationships or links between clusters.
It is the opinion of the authors that these criteria, through valuable for deeper insight and fine-tuning of the clustering process, as well as for obtaining a first gross indication of the overall quality of a clustering session, should nonetheless be handled with extreme care and, most importantly, should not be regarded as a feasible shortcut for validating a cluster map a priori. (Fattori, Michele & Pedrazzi, Giorgio & Turra, Roberta, Text mining applied to patent mapping: a practical business case, op. cit.)
Another IP tool that is especially useful in analyzing emerging fields such as e-commerce is a topographical or IP Landscape Map, which can reveal the terrain of competition along a variety of parameters. It can show where your competitors are spending the most R&D over time, for instance. Or it can spot areas of increased technology development in a nascent industry. (Rivette, Kevin G. and Kline, David, Rembrandts in the Attic: Unlocking the Hidden Value of Patents, op. cit. p. 74)
From a global viewpoint, these maps show relationships among fields or disciplines. The labels attached or embedded in the graphics reveal their semantic connections and may hint at why they are linked to one another. Furthermore, the maps reveal which realms of science or scholarship are being investigated today and the individuals, publications, institutions, regions, or nations currently pre-eminent in these areas. (Chen, Chaomei, Mapping Scientific Frontiers: The Quest for Knowledge Visualization, Springer, London, 2003)
There are a number of different techniques used for dimension reduction that result in a map layout. The most commonly used reduction algorithm is multidimensional scaling; however, its use has typically been limited to data sets on the order of tens or hundreds of items. Factor analysis is another method for generating measures of relatedness. In a mapping context, it is most often used to show factor memberships on maps created using either MDS or pathfinder network scaling, rather than as the sole basis for a map. (...) Layout routines capable of handling (...) large data sets include Pajek (...); self-organizing maps, (...) and the bioinformatics LGL (...). (Boyack, Kevin W. and Klavans, Richard & Börner, Katy, Mapping the backbone of science, op. cit.)
Clustering vs. analyse sous forme de tableaux, graphiques ("tableau de bord")
Clearly, even though the two different analyses more or less show the same overall trends, the patent maps obtained through text mining are easier to understand, in part because they are presented in graphical rather than textual or tabular form.
In fact, the patent maps generated by text clustering allow for a better overview of the relationships between the different areas of patent activity, at the same time avoiding the work involved in using different, more detailed patent classifications, such as for example the IPC. In particular, the IPC was felt to be either too broad (at the class/subclass level) or too detailed (at the group/subgroup level) to effectively carry out an optimal patent portfolio analysis. (Fattori, Michele & Pedrazzi, Giorgio & Turra, Roberta, Text mining applied to patent mapping: a practical business case, op. cit.)
Validation
Validation of science maps is a difficult task. In the past, the primary method for validating such maps has been to compare them with the qualitative judgments made by experts, and has been done only for single-discipline-scale maps. (Boyack, Kevin W. and Klavans, Richard & Börner, Katy, Mapping the backbone of science, op. cit.)
The validation step can be performed manually, i.e. by reading every document in each cluster, can rely upon some kind of calculated statistical index, be facilited through the use of a number of different graphical tools, or any combination of the above. In any event, this difficult but necessary step is likely to be rather time-consuming, especially in the case of large and complicated patent maps. (Fattori, Michele & Pedrazzi, Giorgio & Turra, Roberta, Text mining applied to patent mapping: a practical business case, op. cit.)
When a preexisting classification of the documents is already present, a number of metrics, for example the entropy, the purity, or the grain ratio, are well known in the field of document clustering for helping the analyst to perform the validation step.
Unfortunately, on a general perspective, the problem with these kinds of metrics is that they do not provide a measure of the intrinsic quality of a document clustering, rather they show the degree of alignment between the clustering and the preexisting classification.
When it comes to interpreting the results of a clustering, the effective usefulness of these metrics for competitive intelligence applications is therefore unclear.
In order to overcome these issues, we had to choose our own validation criteria on the grounds of our understanding of the subject matter involved, and then check the consistency of every cluster in the bubble maps against these criteria.
Therefore it was decided that, for a cluster to be considered valid, at least 50% of its documents had to be found to be homogeneous, where the said homogeneity had to be intellectually assessed. (Fattori, Michele & Pedrazzi, Giorgio & Turra, Roberta, Text mining applied to patent mapping: a practical business case, op. cit.)
Another set of issues arise in how one depicts distances in a 2-D or 3-D mapping. Some depictions convey what's happening more effectively than others. In most situations there is not a simple "right" way, but rather several reasonable representations. Picking one or a few highly informative ways to map a data set combines science (statistics) and art (a sens of what best informs th target users). But if several topics are being compared, consistent representation is vital. In many semantic spaces, the axes themselves are arbitrary -be sure your users understand this. (Porter, Alan L. and Cunningham, Scott W., Tech mining: Exploiting New Technologies for Competitive Advantage, op. cit. p. 175)
+
cf. Porter, Alan L. and Cunningham, Scott W.,
Tech mining: Exploiting New Technologies for Competitive Advantage,
op. cit. chapitre 15
In choosing a cluster method for use in experimental IR, two, often conflicting, criteria have frequently been used. The first of these, and in my view the most important, at this stage of the development of the subject, is the
theoretical soundness of the method. By this I mean that the method sould satisfy certain criteria of adequacy. To list some of the more important of these :
- the method produces a clustering which is unlikely to be altered drastically when further objects are incorporated, i.e. it is stable under growth;
- the method is stable in the sens that small errors in the description of the objects lead to small changes in the clustering;
- the method is independent of the initial ordering of the objects.
These conditions have been adapted from Jardine and Sibson. The point is that any cluster method which does not satisfy these conditions is unlikely to produce any meaningful experimental results. Unfortunately not many cluster methods do satisfy these criteria, probably because algorithms implementing them tend to be less efficient than
ad hoc clustering algorithms. (
van Rijsbergen, C. J.,
Information Retrieval,
Butterworths,
London,
2nd ed.,
1979,
http://www.dcs.gla.ac.uk/Keith/Preface.html )
Arbre de citations, estimation de la valeur
The usefulness of U.S. patent citations as an indicator of overall value has long been debated and is still highly in doubt. […] do not focus on the inherent value of a patent, but instead examine the relationships that patents have to one another and the implied relationships that organizations have to each other when either the organization or a patent examiner cites one document with respect to another. (Trippe, Anthony J., Patinformatics: Identifiying Haystacks from Space, op. cit.)
One of the most powerful tools for developing an IP-based Fix strategy involves tapping the asset value of your patent portfolio itself. Chapter 5, for example, shows how a Patent Citation Tree can help identify potential targets for a patent licensing program aimed at genering new revenues. (Rivette, Kevin G. and Kline, David, Rembrandts in the Attic: Unlocking the Hidden Value of Patents, op. cit. p. 79)
It's worth noting, however, that the more often a patent is cited (…) the more fundamental it is and therefore the harder to get around without infringing. Furthermore, whenever a company cites a number of your patents, the likelihood of infringement increases. Finally, the further afield from your original application a firm's use of similar technology is, the more likely it is infringing-or the more likely it is that the firm is a good licensing prospect or joint venture partner. (Rivette, Kevin G. and Kline, David, Rembrandts in the Attic: Unlocking the Hidden Value of Patents, op. cit. p. 129)
The more profitable the invention is, the longer will the patent life probably be extended. Consequently, the patent life will be an indication of importance. (Basberg, Bjorn L., Patents and the measurement of technological change: A survey of the literature, in 16 Research Policy, 131-141 (1987) p. 136)
Foreign patents are used as technology indicators because, on average, they are expected to be of a higher quality than domestic patents. (Basberg, Bjorn L., Patents and the measurement of technological change: A survey of the literature, op. cit. p. 136)
Citation stock2006 not found! :
DIALOG, Questel-Orbit, and STN already have mastered direct citations thus far; desiderata are patent co-citations and bibliographic coupling of patents. Bibliographic coupling (Kessler, 1963) means that two articles (A and B) are coupled when they reference to the same document (C). Bibliographic coupling of patents means, accordingly, that two patents A’ and B’ are coupled when they name the same Patent C’ under ‘Cited Patents.’ A useful variant on this is the bibliographic coupling of patent assignees. This type considers in C not individual patents but their applicants. Two patents are coupled when they cite both patents of the same assignee (Huang, Chiang, & Chen, 2003). Co-citation (Small, 1973) means that two documents (Y and Z) are co-cited when they are both referenced to in an Article X. Patent co-citation means, accordingly, that two patents Y’ and Z’ are co-cited, when they occur together in the ‘Cited Patents’ field of patent X’ (Chen, 2003, pp. 161 ff.; Mogee & Kolar, 1999). Here, it is useful as well to consider the assignee. Co-assignee citation means the citation of patents of different assignees by one patent.
A simple method to receive additional information about the effect of a patent is to count the number of citations per patent. Since patents from different times show a varying probability of being cited, the indicator can be narrowed to citations per patent and per year. This indicator becomes extremely useful in searches for patent assignees, inventors, or technical field (e.g., by IPC or ECLA), so that now the annual average citation rate of assignees, inventors, or technical fields is calculated. In patent informetrics (Hall et al., 2005; Narin, 1994), further characteristic values are possible, such as the Current Impact Index of a company (i.e., number of citations of a company’s patents with priority from the last 5 years during the year under review, related to the average citation rate of the corresponding technical discipline) or Science Linkage (i.e., average number of a patent’s references to scientific literature), which are both used at CHI Research (Narin, 1999).
Auto-citations
We also proposed that the fraction of "self-citations" - citations that come from patents assigned to the same organization - was an indicator of the originating organization's successful appropriation of the subsequent fruits of that research. The data confirm that this fraction was much higher for firms than for universities, and it was higher for large firms than for small firms (as high as 25 percent for the largest firms). (
Jaffe, Adam B.,
Patents, Patent Citations, and the Dynamics of Technological Change,
NBER,
1998,
http://www.nber.org/reporter/summer98/jaffe_summer98.html )
The mean percentage of self-citations is 11% for the lower bound, and 13% for the upper bound. However, there are wide differences accross technological fields, as shown in Figure 12. The fact that the percentages are much higher in Chemical and in Drugs corresponds well with what we know about these fields: innovation is concentrated in very large firms, and hence the likelihood that they will cite internally is higher. (
Trajtenberg, Manuel & Jaffe, Adam B. and Hall, Bronwyn H.,
The NBER/Case Western Patents Data File: A Guided Tour,
NBER,
2000,
http://emlab.berkeley.edu/users/bhhall/papers/TrajtenbergJaffeHall00%20patdata.pdf )
Visualisation
Presenting retrieval results in a list has been the standard way of communicating the contents of a document database to the user. This is in many ways a constraint on understanding not only effects of the request the user posed, but also a limitation on the user view of the document space and interrelations between items in it. There are numerous visualization tools available that provide alternative views of retrieval results, but their presumed beneficial effects are hard to evaluate, the communicative inventions they make use of are ad hoc, and — on a more trivial level — they are usually not portable enough to be made available more than in specific situations. But most importantly, usually visualization tools in retrieval systems only communicate the same data that the list does in some slightly more accessible form. To be useful, visualization tools need to add substantially more value to the interaction. (Citation kalgreen2000 not found!)
In positional methods, the proximity of items on the map is meant to express their multidimensional similarity. MDS allows an optimal representation in
n dimensions, especially two, but the projection stress often makes the interpretation difficult for final users. This is also true for factor techniques, which have the advantage of revealing latent dimensions. (
Zitt, Michel,
Facing Diversity of Science: A Challenge for Bibliometric Indicators, in
3 Measurement,
1,
38--49 (
2005),
http://www.obs-ost.fr/doc_attach/FacingDiversityOfScience.pdf )
Besoin de connaître le fonctionnement de ces outils…
[…] experienced users need to be able to target results with a full understanding of how the systems work, otherwise there is too much uncertainty and risk in the perceived accuracy of the results. (Akers, Lucy, The future of patent information---a user with a view, in 25 World Patent Information, 303-312 (2003))
[…] there still is a high degree of scepticism as regards the use of these new linguistic technologies. At least in part, this is due to the relative "black box" effect inherently attached to the nature of the said technology. Not surprisingly then, professional patent searchers are rather suspicious of tools that do not generally grant the user complete control over their inner workings. (Fattori, Michele & Pedrazzi, Giorgio & Turra, Roberta, Text mining applied to patent mapping: a practical business case, op. cit.
Les citations dans les brevets
One can gauge the pace of innovation by measuring median time lags from citing applications to cited patent grant dates. […] citation rates vary by year of issue (earlier patents have had more time to be cited) and by technology area. Normalization is necessary. (Porter, Alan L. and Cunningham, Scott W., Tech mining: Exploiting New Technologies for Competitive Advantage, op. cit. p. 238)
[The citation] information gives the closest state of the art detected by a patent examiner during the statutory search. When it takes him about 5 min to classify a patent according to the IPC scheme, his search lasts for about 1 day. In other words the work involved in producing the citation field represents the biggest added value to a patent document. It is therefore legitimate to use this field as much as possible. (Schwander, Paul, An evaluation of patent searching resources: comparing the professional and free on-line databases, in 22 World Patent Information, 147-165 (2000))
Distinguer :
1) utilisation courante des citations (comme outil de recherché ainsi qu’indiqué par Garfield,
reverse-searching, comprendre le champ de l'invention)
1)
citation studies qui peuvent avoir comme but :
a) scientométrie (evaluation d’un organisme de recherche, etude des liens entre recherche et technologie…)
a)
IP management &
competitive intelligence (
portfolio value evaluation,
competitor tracking,
licensing opportunities…)
Little has been published on the cognitive and sociological functions of citations in patents (Collins & Wyatt, 1988). This is in marked contrast to the many studies on citer motivations in journal articles (Cronin, 1984). (Oppenheim, Charles, Do Patent Citations Count?, in Cronin, Blaise & Barsky Atkins, Helen (ed.), The Web of Knowledge: A Festschrift in honor of Eugene Garfield, Information Today, Inc., 405, 2000)
Some of the citations made by a examiners appear bizarre to librarians or information scientists. Garfield (1979, p. 39) commented cautiously that “there are no categorical answers” to the question of items cited by patent examiners are indeed relevant to the subject area. Garfield (1966), Oppenheim (1976), Dunlop & Oppenheim (1980) and van Dulken (1999) provide evidence that many examiner citations are inappropriate. […] only about one third of all examiner citations have a close relationship to the subject matter of the citing patents, although they are in the same broad technologies. […] One particularly interesting result was that where two patents in the same family were checked, one an EPO patent and the other a US patent, more than 75% of the references cited were different. (Oppenheim, Charles, Do Patent Citations Count?, op. cit.)
Despite all this confusion, there can be no disputing Garfield’s statement (Garfield, 1979) that “a citation index of the patent literature identifies relationships between patents that are not identified any other way.” (Oppenheim, Charles, Do Patent Citations Count?, op. cit.)
The approach taken in this paper is to break down the use of patent citation analysis into five sub-headings:
- Evaluation of the performance of an industry or a country’s technology.
- Tracing the transfer of knowledge from science to technology, from technology to technology, or from the defence to civil fields. This is currently the most popular area for research.
- Identifying key earlier patents for patent litigation purposes, for identifying the history of a technical subject, and in particular for identifying key pioneer patents.
- Identifying the speed of development of a technical subject
- Miscellaneous applications (Oppenheim, Charles, Do Patent Citations Count?, op. cit.)
Both Trajtenberg (1990) and Albert et al. (1991) point out that the technological impact and commercial value of patents are two separate issues. Because of the difficulties of separating out these two factors, most researchers are content to rely on the use of terms such as “importance”. As Narin, Noma & Perry (1987) point out, patents are probably highly cited for two sometimes interrelated reasons. The first is that they are seminal patents; this implies that the originating company will have a disproportionate share of that technology. Secondly, high citations are often due to follow up patents from the same company. That is, highly cited patents are often part of a tightly interlocked stream of inventions from the company. Overall, then, the evidence is still inconclusive, but is there some evidence that that highly cited patents are indeed those that are technologically or economically important. (Oppenheim, Charles, Do Patent Citations Count?, op. cit.)
However, the most detailed criticism of patent citation analysis can be found in Kaback, Lambert & Simmons (1994). These three authors are extremely experienced patent searchers from the chemical and pharmaceutical industries, and their criticisms need to be considered carefully. They emphasise that patents are not governed by the same rules of etiquette as journal articles. The references that are made by the applicant rarely look like the bibliography of a journal article. The authors of the patent application wish to avoid any implication that the current patent application grew naturally out of earlier work. Thus, most of the prior art that is cited by the applicants relates to unsuccessful approaches to the question. Turning to the examiner citations, these are driven solely by the claims in the applicant’s patent specification. The claims precisely define the monopoly right that the applicant wishes to gain. The examiner is only required to cite one reference that anticipates the claim in some way. The examiner may add further citations, but the primary function of the citations remains the same – to prove that what is claimed is not new. This point should be stressed: the text of the patent claims is not identical to, and does not necessarily reflect, the text of the remainder of the patent specification. The examiner’s preoccupation with the claims means that the items cited by the examiner do not necessarily reflect the bulk of the patent specification. […] The authors conclude that patent citations are useful as subject matter search tools, just as Garfield suggested. They also agree that frequently the highly cited patents are indeed the industrially important breakthroughs. In summary, they warn strongly against simplistic use of citation counts. (Oppenheim, Charles, Do Patent Citations Count?, op. cit.)
Patents provide far fewer citations than journal articles, so the possibilities for statistical analysis are limited. There is no evidence that either examiner or applicant citations reflect the subject matter of the citing patent. Their reasons for citing are different, but neither is to do with providing a useful background literature survey. In any case, there is a paucity of understanding about examiner and applicant citing motivations. (Oppenheim, Charles, Do Patent Citations Count?, op. cit.)
The use of citations as a quality indicator of patents is advocated by empirical research, which has established a positive relationship between the citation frequency of patents and different measures of commercial success. (
Ernst1998 p.7)
Information retrieval et data mining
A la croisée des chemins : découverte de nouvelles opportunités et de nouveaux usages (undiscovered public knowledge)
Swanson and Smalheiser (1997) defined non-interactive literature as two literatures that have not been connected by a significant citation tie. In other words, scientists in neither camps have regarded the existence of a meaningful connection between the two literatures. A key step in Swanson's methodology is the identification of the two premises A -> B and B -> C. In a large knowledge domain, identifying two such premises is like searching for needles in a haystack. Knowledge visualization aims to capture the structure of a knowledge domain and increase the chance of finding something useful. (Chen, Chaomei, Mapping Scientific Frontiers: The Quest for Knowledge Visualization, op. cit. p. 194)
Déploiement de ces nouveaux outils : information stratégique
Il y a 6 types différents d'utilisateurs de l'information brevet, chacun ayant un besoin et des attentes particulières :
attorneys,
IP managers,
product managers,
new product developers,
strategic planners et
managers (
cf. Porter, Alan L. and Cunningham, Scott W.,
Tech mining: Exploiting New Technologies for Competitive Advantage,
op. cit. pp. 240-243)
It is important that a science map be as accurate as possible when used in a decision-making context within the S&T enterprise. (...) To allow our maps to be used in the decision-making process, we have embarked on a project to make them as accurate as possible. By accuracy, we mean that journals within the same subdiscipline should be grouped together, and groups of journals that cite each other should be proximate to each other on the map. (Boyack, Kevin W. and Klavans, Richard & Börner, Katy, Mapping the backbone of science, op. cit.)
Dans l'organisation
Potential tech mining users don't fall neatly into single organizational units. (Porter, Alan L. and Cunningham, Scott W., Tech mining: Exploiting New Technologies for Competitive Advantage, op. cit. ch. 14)
[…] executive selection processes at companies like General Motors favor those accultured into "business" over managers with technical background. The resulting lack of managerial familiarity with technologies reinforces tendencies to rely more on marketing and financial information in reaching decisions. (Porter, Alan L. and Cunningham, Scott W., Tech mining: Exploiting New Technologies for Competitive Advantage, op. cit. p. 293)
Although the "centralizd vs. decentralized" argument always has two sides, we believe that somme centralization of tech mining makes good sense. External R&D publication and patent abstracts tend to be costly information resources. Capturing benefits from these databases gains from having a core of knowledgeable persons who build up experience in handling those data. In addition, one doesn't want the organization paying multiple times for the same data. A centralized unit can bargain for the best deal in licensing databases and help ensure data quality. Centralized operations also facilitates sharing and retention of tech mining results via intranet sites, report series, etc. In addition, centralization facilitates sharing of tech mining experiences to support continued learning, effective training, and mutual reinforcement. (Porter, Alan L. and Cunningham, Scott W., Tech mining: Exploiting New Technologies for Competitive Advantage, op. cit. p. 300)
Companies like AT&T, IBM, and Lucent have been taking a close strategic look a their patents. Because the people involved in this analysis have typically been the CEOs and VPs of the companies, it has been necessary to look at patents "from 30,000 feet", instead of the intricate analysis typically done by patent professionals. Patent mapping offers this 30,000-foot view. The maps give CEOs and VPs a way of looking at, and understanding, the patent landscape without having to understand all of the intricacies of patent law. (
Boulakia, Charles,
Patent mapping,
ScienceCareers.org,
oct,
2001,
http://sciencecareers.sciencemag.org/career_development/previous_issues/articles/1190/patent_mapping )
Information when analyzed becomes intelligence. Intelligence is the main work product from an analyst. (Trippe, Anthony J., Patinformatics: Identifiying Haystacks from Space, op. cit.)
Regarding the kind of information the respondent would need, 15% to 19% of the companies apparently have access to all the information they need. For the rest, information on competitors and markets is the most important in terms of need. (
Doornbos, Rob & Gras, Renske & Toth, Jozsi,
Usage Profiles of Patent Information Among Current and Potential Users,
European Patent Office,
2003,
http://www.european-patent-office.org/news/info/survey2003/epo_user_survey.pdf )
[…] patent data must be understood as a strategic information source, which contributes important information to the effective and efficient management of technology. This type of patent information addresses two major groups of recipients inside and outside the organization. First, it addresses decision makers from senior management inside the firm who make important strategic decisions, e.g. on the R&D budget or on M&A. […] Second, strategic patent information addresses external stakeholders and analysts whose perception of the firm's technological competence can have a major impact on the firm's stock market performance. […]
The retrieval and evaluation of patent data should be institutionalized within the organization in order to ensure the continuous and systematic use of patent information in a company's decision-making processes. […] The above mentioned types of recipients of patent information at the senior management level and outside the firm require that complex and rich patent information is presented in a manner that is familiar to the target audience. (Ernst, Holger, Patent information for strategic technology management, in 25 World Patent Information, 233-242 (2003))
This book emphasizes that tech mining is more than [delivering information and analyses reports]; in particular, the radical message to some analysts is that [engaging tech mining doers and users to collaborate in focusing, interpreting, and acting on tech mining results], interpersonal process, merits major attention. […] Tech mining activities vary considerably as you adapt them to your organization's needs. […] A key principle: good tech mining exploits multiple information sources with multiple tools. In particular, empirical sources (database mining) need to be complemented by expert opinion. (Porter, Alan L. and Cunningham, Scott W., Tech mining: Exploiting New Technologies for Competitive Advantage, op. cit. p. 20)
[…] tech mining principles and outputs are not generally familiar. As with any innovative approach, the burden is on the innovators to convince the establishment of their legitimacy and value. In presenting tech mining "answers", do your homework thoroughly. Know how familiar your intended users are with these sorts of analyses. If they are relatively unfamiliar, explain what has been done with extra care and thoroughness. (Porter, Alan L. and Cunningham, Scott W., Tech mining: Exploiting New Technologies for Competitive Advantage, op. cit. p. 282)
But most businesses ignored this gold mine of patent data - "the greatest source of competitive intelligence on earth," he called it. (Rivette, Kevin G. and Kline, David, Rembrandts in the Attic: Unlocking the Hidden Value of Patents, op. cit.)
Of course, the aim is to enable the analytical tools and/or the analyzed information to be shared between information scientists and researchers, but there may be restrictions applied through the subscription and licence agreements which impact the desire to share these tools and their outputs, and prohibitive costs per user or for the initial download of datasets may influence their roll out within an organisation. (Eldridge, Jeanette, Data visualisation tools--a perspective from the pharmaceutical industry, op. cit.)
A further criterion to identify is how the package compares to other tools already established within the user population (we need to bear in mind the barriers which often exist to adoption of new tools and processes, where there are significant changes or retraining required), and how it might be integrated with other tools already in use, to improve our overall processes and decision-making (...). (Eldridge, Jeanette, Data visualisation tools--a perspective from the pharmaceutical industry, op. cit.)
Gestion d'un portefeuille de brevets
Along with this explosion of patents has come a boom in the revenues derived from patent licensing, as companies realize that intellectual property is among their most valuable and fungible of assets [Remember : we are in a knowledge-base economy, économie mondialisée de la connaissance]. Patent licensing revenues have shot up 700 percent in the past eight years alone, from $15 billion in 1990 to well over $100 billion in 1998. And experts point out that the licensing market is still only in its infancy. Revenues, they say, could top a half-trillion dollars annually by the middle of the next decade. Rivette, Kevin G. and Kline, David, Rembrandts in the Attic: Unlocking the Hidden Value of Patents, op. cit. p. 5)
How does one actually develop an effective patent strategy? The first step is to get a handle on the size and strength of the patent assets themselves. This step may seem elementary, but as you will read in numerous case studies throughout this book, most firms have little or no idea of the commercial potential and financial value of their IP assets. (Rivette, Kevin G. and Kline, David, Rembrandts in the Attic: Unlocking the Hidden Value of Patents, op. cit. p. 65)
One technique that Xerox may employ in its licensing effort is to identify groups of patents within its portfolio that could be licensed together as a package. This can be done by analyzing co-citation clustering patterns. That is, if other firms consistently cite a cluster of Xerox patents in their own, then it may be advantageous for XIPO to package those patents and market them as a group. The marketing of technology assets was previously unheard of at Xerox. (Rivette, Kevin G. and Kline, David, Rembrandts in the Attic: Unlocking the Hidden Value of Patents, op. cit. p. 127)
- exemples d'utilisation : étude qui compare les portefeuilles de brevets dans les agro-biotechnologies entre les secteurs public et privé (Graff, Gregory D. and Cullen, Susan E. and Bradford, Kent J. and Zilberman, David & Benett, Alan B., The public-private structure of intellectual property ownership in agricultural biotechnology, in 21 Nature Biotechnology, 9, 989-995 (2003)), étude des tendances et changements de technologies d'une entreprise (Fattori, Michele & Pedrazzi, Giorgio & Turra, Roberta, Text mining applied to patent mapping: a practical business case, op. cit.), étude de la technologie des fuel cells, de son évolution et de ses acteurs (Porter, Alan L. and Cunningham, Scott W., Tech mining: Exploiting New Technologies for Competitive Advantage, op. cit.)
- application théorique : pour gérer au mieux les M&A, cf. Ernst, Holger, Patent information for strategic technology management, op. cit.
Conclusion
Futur du patent mapping
These sophisticated data manipulation tools and methods will ensure that IP information can be targeted and used properly for IP strategy development and management throughout an organization. (Akers, Lucy, The future of patent information---a user with a view, op. cit.)
There is a strong need to further improve our ability to digest the large and ever-increasing volume of patent information. The trend in meeting this need is illustrated by the rapidly increasing number of data visualization and mining tools becoming available. (Akers, Lucy, The future of patent information---a user with a view, op. cit.)
Therefore, the next generation of text mining tools for patent analysis should integrate some kind of facility for manipulating patent classifications and other descriptive indexes or terms, to speed up the whole process, at the same time guaranteeing a professional quality of results. (Fattori, Michele & Pedrazzi, Giorgio & Turra, Roberta, Text mining applied to patent mapping: a practical business case, op. cit.)
Of course, ways to capture not only specific compound information for R-group analysis, but ways to capture biological data, and ways to compare and cluster similarities between generic representations would be a real boon. (Eldridge, Jeanette, Data visualisation tools--a perspective from the pharmaceutical industry, op. cit.)
Nécessité de se tenir à jour
Keeping up with the application of pioneering technologies will become increasingly challenging for patent information professionals. (Akers, Lucy, The future of patent information---a user with a view, op. cit.)
Danger de l'outsourcing
The Twenty-First Century Strategic Plan of the USPTO, launched in 2002, raised the spectre of a massive increase in outsourcing of patent search effort. (Adams, Stephen, Certification of the patent searching profession---a personal view, in 26 World Patent Information, 79-82 (2004))
Nécessité d'atteindre tous ses utilisateurs, réponse aux attentes, diversification des activités
"the challenge for the information professional is—can we respond quickly and adapt to new market circumstances? Are we entrepreneurial in our approach to information services delivery to meet our users needs?" […] the challenge we face today is to replace our relatively passive approach of waiting for clients to come to us by focusing on disseminating patent information more effectively. (Akers, Lucy, The future of patent information---a user with a view, op. cit.)
The patent search professional is today faced with an ever more complex array of tools, which need to be developed, assessed and—in some cases—rolled out to a wider user community. The services of the search professional, long dedicated to tactical functions, are beginning to contribute to the strategic commercial planning and development of industry. (Adams, Stephen, Certification of the patent searching profession---a personal view, op. cit.)
Somewhat surprisingly, many information professionals have not yet "gottent it" - the idea of gaining metaknowledge from analyses of collective bodies of S&T information. […] Todays's information professionals, likewise, have to redefine their profession. They need to take into account users directly accessing electronic ressources. Information professionals can (and are) developing new opportunities for themselves to contribute to information assurance and analyses. (Porter, Alan L. and Cunningham, Scott W., Tech mining: Exploiting New Technologies for Competitive Advantage, op. cit. ch. 14)