页面抓取本质上是HTML页面的反向工程,也可以看成页面解释器,网页以HTML编码,HTML以树型结构表示信息,实际数据与布局代码以及效果信息混杂在一起,不能被计算机直接利用。
It can also be thought of as parsing out chunks of information from a page. Web pages are coded in HTML, which uses a tree-like structure to represent the information.
页面抓取本质上是HTML页面的反向工程,也可以看成页面解释器,网页以HTML编码,HTML以树型结构表示信息,实际数据与布局代码以及效果信息混杂在一起,不能被计算机直接利用。
It can also be thought of as parsing out chunks of information from a page. Web pages are coded in HTML, which uses a tree-like structure to represent the information.
应用推荐