北大青鸟-西安智荟
北大青鸟电话 24小时全国咨询热线: 400-012-3660
网络营销
Java
.Net
Android
ios
网络营销
HTML5
联系方式
赵老师

电话:15091752681

QQ:1298506410

张老师

电话:17629090570

QQ:3398728383

Java

当前位置:主页 > 学习专区 > Java >
jsoup v1.10.2 发布
发布者:智小荟    发布时间:2017-03-03 14:55    浏览次数: [正在统计]

 

    jsoup 是一款 Java 的HTML 解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于JQuery的操作方法来取出和操作数据。

jsoup的主要功能如下:

  • 从一个URL,文件或字符串中解析HTML;
  • 使用DOM或CSS选择器来查找、取出数据;
  • 可操作HTML元素、属性、文本;
  • jsoup是基于MIT协议发布的,可放心使用于商业项目。

更新日志

  • Improved startup time, particularly on Android, by reducing garbage generation and CPU execution time when loading  he HTML entity files. About 1.72x faster in this area.
  • Added Element.is(query) to check if an element matches this CSS query.
  • Added new methods to Elements: next(query), nextAll(query), prev(query), prevAll(query) to select next and previous element siblings from a current selection, with optional selectors.
  • Added Node.root() to get the topmost ancestor of a Node.
  • Added the new selector :containsData(), to find elements that hold data, like script and style tags.
  • Changed Jsoup.isValid(bodyHtml) to validate that the input contains only body HTML that is safe according to the whitelist, and does not include HTML errors. And in the Jsoup.Cleaner.isValid(Document) method, make sure the doc only includes body HTML.
  • <https://github.com/jhy/jsoup/issues/245>
  • <https://github.com/jhy/jsoup/issues/632>
  • In Whitelists, validate that a removed protocol exists before removing said protocol.
  • Allow the Jsoup.Connect thread to be interrupted when reading the input stream; helps when reading from a long stream of data that doesn't read timeout.
  • <https://github.com/jhy/jsoup/pull/712>
  •  Jsoup.Connect now uses a desktop user agent by default. Many developers were getting caught by not specifying the user agent, and sending the default 'Java'. That causes many servers to return different content than what they would to a desktop browser, and what the developer was expecting.
  • Increased the default connect/read timeout in Jsoup.Connect to 30 seconds.
  • Jsoup.Connect now detects if a header value is actually in UTF-8 vs the HTTP spec of ISO-8859, and converts the header value appropriately. This improves compatibility with servers that are configured incorrectly.
  • Bugfix: in Jsoup.Connect, URLs containing non-URL-safe characters were not encoded to URL safe correctly.
  • <https://github.com/jhy/jsoup/issues/706>
  • Bugfix: a "SYSTEM" flag in doctype tags would be incorrectly removed.
  •  <https://github.com/jhy/jsoup/issues/408>
  • Bugfix: removing attributes from an Element with removeAttr() would cause a ConcurrentModificationException.
  • Bugfix: the contents of Comment nodes were not returned by Element.data()
  •  Bugfix: if source checked out on Windows with git autocrlf=true, Entities.load would fail because of the \r char.

下载

友情链接
智荟官方微信
智荟官方微博