|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.jsoup.Jsoup
public class Jsoup
The core public access point to the jsoup functionality.
| Method Summary | |
|---|---|
static java.lang.String |
clean(java.lang.String bodyHtml,
java.lang.String baseUri,
Whitelist whitelist)
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes. |
static java.lang.String |
clean(java.lang.String bodyHtml,
Whitelist whitelist)
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes. |
static Connection |
connect(java.lang.String url)
Creates a new Connection to a URL. |
static boolean |
isValid(java.lang.String bodyHtml,
Whitelist whitelist)
Test if the input HTML has only tags and attributes allowed by the Whitelist. |
static Document |
parse(java.io.File in,
java.lang.String charsetName)
Parse the contents of a file as HTML. |
static Document |
parse(java.io.File in,
java.lang.String charsetName,
java.lang.String baseUri)
Parse the contents of a file as HTML. |
static Document |
parse(java.io.InputStream in,
java.lang.String charsetName,
java.lang.String baseUri)
Read an input stream, and parse it to a Document. |
static Document |
parse(java.io.InputStream in,
java.lang.String charsetName,
java.lang.String baseUri,
Parser parser)
Read an input stream, and parse it to a Document. |
static Document |
parse(java.lang.String html)
Parse HTML into a Document. |
static Document |
parse(java.lang.String html,
java.lang.String baseUri)
Parse HTML into a Document. |
static Document |
parse(java.lang.String html,
java.lang.String baseUri,
Parser parser)
Parse HTML into a Document, using the provided Parser. |
static Document |
parse(java.net.URL url,
int timeoutMillis)
Fetch a URL, and parse it as HTML. |
static Document |
parseBodyFragment(java.lang.String bodyHtml)
Parse a fragment of HTML, with the assumption that it forms the body of the HTML. |
static Document |
parseBodyFragment(java.lang.String bodyHtml,
java.lang.String baseUri)
Parse a fragment of HTML, with the assumption that it forms the body of the HTML. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Method Detail |
|---|
public static Document parse(java.lang.String html,
java.lang.String baseUri)
html - HTML to parsebaseUri - The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur
before the HTML declares a <base href> tag.
public static Document parse(java.lang.String html,
java.lang.String baseUri,
Parser parser)
html - HTML to parsebaseUri - The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur
before the HTML declares a <base href> tag.parser - alternate parser to use.
public static Document parse(java.lang.String html)
<base href> tag.
html - HTML to parse
parse(String, String)public static Connection connect(java.lang.String url)
Connection to a URL. Use to fetch and parse a HTML page.
Use examples:
Document doc = Jsoup.connect("http://example.com").userAgent("Mozilla").data("name", "jsoup").get();Document doc = Jsoup.connect("http://example.com").cookie("auth", "token").post();
url - URL to connect to. The protocol must be http or https.
public static Document parse(java.io.File in,
java.lang.String charsetName,
java.lang.String baseUri)
throws java.io.IOException
in - file to load HTML fromcharsetName - (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if
present, or fall back to UTF-8 (which is often safe to do).baseUri - The URL where the HTML was retrieved from, to resolve relative links against.
java.io.IOException - if the file could not be found, or read, or if the charsetName is invalid.
public static Document parse(java.io.File in,
java.lang.String charsetName)
throws java.io.IOException
in - file to load HTML fromcharsetName - (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if
present, or fall back to UTF-8 (which is often safe to do).
java.io.IOException - if the file could not be found, or read, or if the charsetName is invalid.parse(File, String, String)
public static Document parse(java.io.InputStream in,
java.lang.String charsetName,
java.lang.String baseUri)
throws java.io.IOException
in - input stream to read. Make sure to close it after parsing.charsetName - (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if
present, or fall back to UTF-8 (which is often safe to do).baseUri - The URL where the HTML was retrieved from, to resolve relative links against.
java.io.IOException - if the file could not be found, or read, or if the charsetName is invalid.
public static Document parse(java.io.InputStream in,
java.lang.String charsetName,
java.lang.String baseUri,
Parser parser)
throws java.io.IOException
in - input stream to read. Make sure to close it after parsing.charsetName - (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if
present, or fall back to UTF-8 (which is often safe to do).baseUri - The URL where the HTML was retrieved from, to resolve relative links against.parser - alternate parser to use.
java.io.IOException - if the file could not be found, or read, or if the charsetName is invalid.
public static Document parseBodyFragment(java.lang.String bodyHtml,
java.lang.String baseUri)
body of the HTML.
bodyHtml - body HTML fragmentbaseUri - URL to resolve relative URLs against.
Document.body()public static Document parseBodyFragment(java.lang.String bodyHtml)
body of the HTML.
bodyHtml - body HTML fragment
Document.body()
public static Document parse(java.net.URL url,
int timeoutMillis)
throws java.io.IOException
connect(String) instead.
The encoding character set is determined by the content-type header or http-equiv meta tag, or falls back to UTF-8.
url - URL to fetch (with a GET). The protocol must be http or https.timeoutMillis - Connection and read timeout, in milliseconds. If exceeded, IOException is thrown.
java.io.IOException - If the final server response != 200 OK (redirects are followed), or if there's an error reading
the response stream.connect(String)
public static java.lang.String clean(java.lang.String bodyHtml,
java.lang.String baseUri,
Whitelist whitelist)
bodyHtml - input untrusted HMTLbaseUri - URL to resolve relative URLs againstwhitelist - white-list of permitted HTML elements
Cleaner.clean(Document)
public static java.lang.String clean(java.lang.String bodyHtml,
Whitelist whitelist)
bodyHtml - input untrusted HTMLwhitelist - white-list of permitted HTML elements
Cleaner.clean(Document)
public static boolean isValid(java.lang.String bodyHtml,
Whitelist whitelist)
bodyHtml - HTML to testwhitelist - whitelist to test against
clean(String, org.jsoup.safety.Whitelist)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||