【分享】Stanford Dataset全集之Citation networks

发布时间:2016-12-8 6:14:23 编辑:www.fx114.net 分享查询网我要评论
本篇文章主要介绍了"【分享】Stanford Dataset全集之Citation networks",主要涉及到【分享】Stanford Dataset全集之Citation networks方面的内容,对于【分享】Stanford Dataset全集之Citation networks感兴趣的同学可以参考一下。

cit-HepPh(1) Arxiv HEP-PH (high energy physics phenomenology ) citation graph is from the e-print arXiv and covers all the citations within a dataset of 34,546 papers with 421,578 edges. If a paper i cites paper j, the graph contains a directed edge from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this. The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arXiv, and thus represents essentially the complete history of its HEP-PH section. The data was originally released as a part of 2003 KDD Cup.   Dataset statistics Nodes 34546 Edges 421578 Nodes in largest WCC 34401 (0.996) Edges in largest WCC 421485 (1.000) Nodes in largest SCC 12711 (0.368) Edges in largest SCC 139981 (0.332) Average clustering coefficient 0.2962 Number of triangles 1276868 Fraction of closed triangles 0.1457 Diameter (longest shortest path) 12 90-percentile effective diameter 5   cit-HepPh(2) Arxiv HEP-TH (high energy physics theory) citation graph is from the e-print arXiv and covers all the citations within a dataset of 27,770 papers with 352,807 edges. If a paper i cites paper j, the graph contains a directed edge from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this. The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arXiv, and thus represents essentially the complete history of its HEP-TH section. The data was originally released as a part of 2003 KDD Cup.   Dataset statistics Nodes 27770 Edges 352807 Nodes in largest WCC 27400 (0.987) Edges in largest WCC 352542 (0.999) Nodes in largest SCC 7464 (0.269) Edges in largest SCC 116268 (0.330) Average clustering coefficient 0.3295 Number of triangles 1478735 Fraction of closed triangles 0.1196 Diameter (longest shortest path) 14 90-percentile effective diameter 5.4   cit-Patents U.S. patent dataset is maintained by the National Bureau of Economic Research. The data set spans 37 years (January 1, 1963 to December 30, 1999), and includes all the utility patents granted during that period, totaling 3,923,922 patents. The citation graph includes all citations made by patents granted between 1975 and 1999, totaling 16,522,438 citations. For the patents dataset there are 1,803,511 nodes for which we have no information about their citations (we only have the in-links).   Dataset statistics Nodes 3774768 Edges 16518948 Nodes in largest WCC 3764117 (0.997) Edges in largest WCC 16511741 (1.000) Nodes in largest SCC 1 (0.000) Edges in largest SCC 0 (0.000) Average clustering coefficient 0.0919 Number of triangles 7515023 Fraction of closed triangles 0.06714 Diameter (longest shortest path) 22 90-percentile effective diameter 9.4       数据堂免费提供数据挖掘数据集下载:http://www.datatang.com/data/44126       数据堂-国内科研数据免费下载平台

上一篇:软件开发人员招聘评估
下一篇:用ATL建立轻量级的COM对象(八)

相关文章

相关评论