Indexing - create an index of web crawl data.
索引——创建网页检索数据索引。
He gave an example of a web crawl they compressed with the system. The crawl contained 2.1B pages and the rows were named in the following form: “com.cnn.www/index.html:http”.
这个例子的蜘蛛 包含 2.1B 的页面,行按照以下的方式命名:“com.cnn.www/index.html:http”.在未压缩前的web page 页面大小是:45.1 TB ,压缩后的大小是:4.2 TB , 只是原来的 9.2%。
Usually, users are forced to crawl the web or mine social-media sites to build their own.
通常,用户被迫抓取web或者挖掘社会媒体网站来建立他们自己的大型数据集。
应用推荐