Due to the enormous amount of research information being generated everyday in the form research articles, patents, white papers, blog etc., it is becoming increasingly difficult to absorb it and search the relevant information. Citation networks are one kind of social networks that have been studied quantitatively almost from the moment citation databases first became available . Citation Network is a Social network which contains paper sources and linked by co-citation relationships.
Derek J. de Solla Price described the inherent linking characteristic of the SCI in his seminal paper titled “Networks of Scientific Papers”. Egghe & Rousseau proposed that when a document di cites a document dj, we can show this by an arrow going from the node representing di to the document representing dj. In this way the documents from a collection D form a directed graph, which is called a ‘citation graph’ or ‘citation network’. Moreover, the citation networks can also be created between other entities like patents, court cases, blogs, wikipedia references etc.
- To understand the citation evolution process.
How does citation network evolve over the time? Does new incoming nodes attach to old existing nodes always preferentially? Why do nodes start losing popularity? Who benefits if some node loses popularity? These questions have pondered the research community since very long time. The rate at which nodes in evolving citation networks acquire links shows complex temporal dynamics. Preferential attachment and link copying models, while enabling elegant analysis, only capture rich-gets-richer effects, not aging and decline. Recent aging models are complex and heavily parameterized; most involve estimating 1–3 parameters per node. These parameters are intrinsic: they explain decline in terms of events in the past of the same node, and do not explain,using the network, where the linking attention might go instead. We argue that traditional characterization of linking dynamics are insufficient to judge the faithfulness of models. We propose a new family of frugal aging models with no per-node parameters and only two global parameters. Despite very few parameters, the new family of models shows remarkably better fit with real data.
- To predict the future popularity of academic entities.
With the exponential growth of research volume in the recent decades, academic entities like articles, authors, venues, organisations, fields etc. have evolved qualitatively and quantitatively. The scientific community has always been demanding for better algorithms, metrics and features for ranking and categorization of academic entities leading to one of the interesting and well researched problem of understanding and estimating the popularity of these academic entities. We study several interesting factors that influence the popularity of research articles. Specifically, we utilize information generated immediately after the publication to estimate its long-term popularity. This generated information includes both network-based and content-based information.
- To study interesting structural properties to better understand real phenomenon.
Structural properties of a network are defined by the real interactions happening between the individuals. We study these interactions and related structural properties for citation/collaboration networks. We are particularly interested in studying the local context of a node in collaboration network that can help explain the behavior of an author as an individual within the group and a member along with the group. The best representation of such local contextual substructures in a collaboration network are ‘‘network motifs’’. We study the characteristic features of motifs and show how they are related with the goodness measures. We also work with community identification problems. Most community detection algorithms are based on optimizing a combinatorial parameter, for example modularity. This optimization is generally NP-hard, thus merely changing the vertex order can alter their assignments to the community. However, there has been less study on how vertex ordering influences the results of the community detection algorithms. Here we identify and study the properties of invariant groups of vertices (constant communities) whose assignment to communities are, quite remarkably, not affected by vertex ordering.
- To collect, search and recommend from vast scientific research volume.
The overwhelming number of scientific articles over the years calls for smart automatic tools to facilitate the process of literature review. Extracting various metadata, structure and bibliography information from scholarly articles is crucial to a variety of retreival, analysis and recommendation problems, pertaining to scientific digital libraries and scholarly engines. At the same time, it is a difficult problem because of the inherent errors in OCR processing and diverse formatting styles. Current state-of-the-art frameworks tackle this extraction problem using several commercial tools and machine learning models. But, majority of the research is limited to only metadata extraction and the performance is confined to a few specific publishing formats and does not generalize well. We also propose for the first time a framework of faceted recommendation for scientific articles (abbreviated as FeRoSA ) which apart from ensuring quality retrieval of scientific articles for a query paper, also efficiently arranges the recommended papers into different facets (categories). Providing users with an interface which enables the filtering of recommendations across multiple facets can increase users’ control over how the recommendation system behaves. FeRoSA is precisely built on a random walk based framework on an induced subnetwork consisting of nodes related to the query paper in terms of either citations or content similarity.
- To understand the review process
A `peer-review system’ in the context of judging research contributions, is one of the prime steps undertaken to ensure the quality of the submissions received; a significant portion of the publishing budget is spent towards successful completion of the peer-review by the publication houses. Nevertheless, the scientific community is largely reaching a consensus that peer-review system, although indispensable, is nonetheless flawed. A very pertinent question therefore is “could this system be improved?”. We conduct several interesting works that can lead to better peer-review system.
- To predict citation link formations from conference interactions
Recently, conference publications have gained a wide popularity, especially in the domain of computer science. In conferences, the opportunity of personal interactions between the fellow researchers opens up a new dimension for the citation network evolution. Here we propose a generic multiplex network framework to uncover the influence of the interactions in a conference on the appearance of the new citation links in future. We crawl the DBLP citation dataset and perform a case study on the leading conferences in the “Artificial Intelligence”, “Hardware & Architecture”, “Human-Computer Interaction” and “Networking & Distributed Systems” domains. Our empirical study is able to identify significant number of “successful” conference interactions which eventually results in “induced” citations. Interestingly, it is found that in most of the cases, it takes just 3 to 4 years to receive a citation from a participant interacted in a conference. It is also observed that the faster an interaction between two researchers can induce a citation between them, the longer this series of induced citations go on. Finally, we propose a machine learning based recommendation system ‘Whom-to-Interact’, for the researchers attending a conference, to suggest them ‘with whom they should interact’ for gaining incoming citations. The experimental results exhibit a decent performance of the system along with the impact of different regulating factors.
We develop and maintain below online systems:
- OCR++: http://www.cnergres.iitkgp.ac.in/OCR++/home/
- FeRosa: http://www.cnergres.iitkgp.ac.in/ferosa/
- Discern: http://www.cnergres.iitkgp.ac.in/DisCern
- Permanence: http://cse.iitkgp.ac.in/resgrp/cnerg/permanence/
- Circle: http://cse.iitkgp.ac.in/resgrp/cnerg/circle
- PubIndia: http://cse.iitkgp.ac.in/resgrp/cnerg/PubIndia/
Available datasets: Following datasets are freely available and can be requested for download:
- Bibliographic dataset
- Citation context dataset
- Patent dataset
- Mayank Singh, Rajdeep Sarkar, Pawan Goyal, Animesh Mukherjee, Soumen Chakrabarti. 2017. Relay-Linking Models for Prominence and Obsolescence in Evolving Networks. SIGKDD 2017 [pdf]
- Mayank Singh, Ajay Jaiswal, Priya Shree, Arindam Pal, Animesh Mukherjee, Pawan Goyal. 2017. Understanding the Impact of Early Citers on Long-Term Scientific Impact. JCDL 2017 [pdf]
- Mayank Singh, Abhishek Niranjan, Divyansh Gupta, Nikhil Angad Bakshi, Animesh Mukherjee, Pawan Goyal. 2017. Citation sentence reuse behavior of scientists: A case study on massive bibliographic text dataset of computer science. JCDL 2017 [pdf]
- Sandipan Sikdar, Matteo Marsili, Niloy Ganguly, Animesh Mukherjee. Influence of Reviewer Interaction Network on Long-term Citations: A Case Study of the Scientific Peer-Review System of the Journal of High Energy Physics, JCDL, 2017[pdf]
- Marcin Bodych, Niloy Ganguly, Tyll Kruger, Animesh Mukherjee, Rainer Seigmund-Schultze and Sandipan Sikdar. Threshold based epidemic dynamics in systems with memory, Europhysics Letters, 2017. [pdf]
- Mayank Singh, Soham Dan, Sanyam Agarwal, Pawan Goyal, Animesh Mukherjee. 2017. AppTechMiner: Mining Applications and Techniques from Scientific Articles WOSP, JCDL 2017 [pdf]
- Mayank Singh, Barnopriyo Barua, Priyank Palod, Manvi Garg, Sidhartha Satapathy, Samuel Bushi, Kumar Ayush, Krishna Sai Rohith, Tulasi Gamidi, Pawan Goyal, Animesh Mukherjee. 2016. OCR++: A Robust Framework For Information Extraction from Scholarly Articles. COLING 2016 [pdf]
- Mayank Singh, Tanmoy Chakraborty, Animesh Mukherjee, Pawan Goyal. 2016. Is this conference a top-tier? ConfAssist: An assistive conflict resolution framework for conference categorization Journal Of Informetrics 2016 [pdf]
- Tanmoy Chakraborty, Amrith Krishna, Mayank Singh, Pawan Goyal, Niloy Ganguly, Animesh Mukherjee. 2016. FeRoSA: A faceted recommendation system for scientific articles PAKDD 2016 [pdf]
- Binny Mathew, Unnikrishnan TA, Tanmoy Chakraborty , Niloy Ganguly, Samik Datta. Mining Twitter Conversations around E-commerce Promotional Events. 19th ACM conference on Computer-Supported Cooperative Work and Social Computing (CSCW) , 2016. [pdf]
- Sandipan Sikdar, Matteo Marsili, Niloy Ganguly and Animesh Mukherjee. “ Anomalies in the peer-review system: A case study of the journal of High Energy Physics”, CIKM, 2016. [pdf]
- Mayank Singh, Vikas Patidar, Suhansanu Kumar, Tanmoy Chakraborty, Animesh Mukherjee, Pawan Goyal. 2015. The role of citation context in predicting long-term citation profiles: an experimental study based on a massive bibliographic text dataset CIKM 2015 [pdf]
- Mayank Singh, Tanmoy Chakraborty, Animesh Mukherjee, Pawan Goyal. 2015. ConfAssist: A Conflict resolution framework for assisting the categorization of Computer Science conferences JCDL 2015 [pdf]
- Mayank Singh, Soumajit Pramanik, Tanmoy Chakraborty. 2015. PubIndia: A Framework for Analyzing Indian Research Publications in Computer Sciences WOSP, JCDL 2015 [DLib]
- Tanmoy Chakraborty, Niloy Ganguly, Animesh Mukherjee. An author is known by the context she keeps: significance of network motifs in scientific collaborations, Social Network Analysis and Mining (SNAM), 2015. [pdf]
- Tanmoy Chakraborty. Leveraging disjoint communities for detecting overlapping community structure, Journal of Statistical Mechanics: Theory and Experiment (JSTAT), 2015. [pdf]
- Tanmoy Chakraborty, Suhansanu Kumar, Pawan Goyal, Niloy Ganguly, Animesh Mukherjee. On the categorization of scientific citation profiles in computer sciences, Communications of the ACM (CACM), 2015.
- Tanmoy Chakraborty, Vihar Tammana, Niloy Ganguly, Animesh Mukherjee. Understanding and Modeling Diverse Scientific Careers of Researchers, Journal of Informetrics. 2015. [pdf]
- Tanmoy Chakraborty, Sikhar Patranabis, Pawan Goyal, Animesh Mukherjee. On the formation circles in co-authorship networks, In Proceedings of 21th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2015. [pdf]
- Soumajit Pramanik, Pranay Hasan Yerra and Bivas Mitra, paper on “Whom-to-Interact: Does Conference Networking Boost Your Citation Count?“, The 2nd IKDD (India chapter of ACM SIGKDD) Conference on Data Sciences (CoDS), 2015.
- Tanmoy Chakraborty, Sandipan Sikdar, Niloy Ganguly, Animesh Mukherjee. Citation Interactions among Computer Science Fields: A Quantitative Route to the Rise and Fall of Scientific Research, Social Network Analysis and Mining, 2014. [pdf]
- Tanmoy Chakraborty, Niloy Ganguly, Animesh Mukherjee. Automatic Classification of Scientific Groups as Productive: An Approach based on Motif Analysis, In Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Beijing, China August 17-20, 2014. [pdf]
- Tanmoy Chakraborty, Suhansanu Kumar, Pawan Goyal, Niloy Ganguly, Animesh Mukherjee. Towards a Stratified Learning Approach to Predict Future Citation Counts, Digital Libraries (ACM/IEEE JCDL, TPDL), 2014. [pdf]
- Tanmoy Chakraborty, Sriram Srinivasan, Niloy Ganguly, Animesh Mukherjee, Sanjukta Bhowmick. On the permanence of vertices in network communities, 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2014. [pdf]
- Tanmoy Chakraborty, Vihar Tammana, Niloy Ganguly, Animesh Mukherjee. Analysis and Modeling of Lowest Unique Bid Auctions, The Sixth ASE International Conference on Social Computing (SocialCom-2014), 2014. [pdf]
- Tanmoy Chakraborty, Niloy Ganguly, Animesh Mukherjee. Rising Popularity of Interdisciplinary Research – an Analysis of Citation Networks, Workshop on Science and Engineering of Social Networks (SCINSE), 6th International Conference on Communication System and Networks (COMSNETS-2014, 2014. [pdf]
- Tanmoy Chakraborty, Sriram Srinivasan, Niloy Ganguly, Sanjukta Bhowmick, Animesh Mukherjee. Constant Communities in Complex Networks, Nature Scientific Reports, 2013. [pdf]
- Tanmoy Chakraborty, Srijan Kumar, M Dastagiri Reddy, Suhansanu Kumar, Niloy Ganguly, Animesh Mukherjee. Automatic Classification and Analysis of Interdisciplinary Fields in Computer Sciences, 2013 ASE/IEEE International Conference on Social Computing (SocialCom-2013), 2013. [pdf]
- Tanmoy Chakraborty, Abhijnan Chakraborty. OverCite: Finding Overlapping Communities in Citation Network, In Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013), 2013. [pdf]
- Tanmoy Chakraborty, Sandipan Sikdar, Vihar Tammana, Niloy Ganguly, Animesh Mukherjee. Computer Science Fields as Ground-truth Communities: Their Impact, Rise and Fall, In Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2013. [pdf]