| License | BSD-style (see LICENSE) | 
|---|---|
| Stability | experimental | 
| Safe Haskell | Trustworthy | 
| Language | Haskell2010 | 
Data.Text.Internal.Encoding
Contents
Description
Warning: this is an internal module, and does not have a stable API or name. Functions in this module may not check or enforce preconditions expected by public modules. Use at your own risk!
Internals of Data.Text.Encoding.
Since: text-2.0.2
Synopsis
- validateUtf8Chunk :: ByteString -> (Int, Maybe Utf8State)
- validateUtf8More :: Utf8State -> ByteString -> (Int, Maybe Utf8State)
- decodeUtf8Chunk :: ByteString -> (StrictBuilder, ByteString, Maybe Utf8State)
- decodeUtf8More :: Utf8State -> ByteString -> (StrictBuilder, ByteString, Maybe Utf8State)
- decodeUtf8With1 :: OnDecodeError -> String -> ByteString -> Text
- decodeUtf8With2 :: OnDecodeError -> String -> Utf8State -> ByteString -> (StrictBuilder, ByteString, Utf8State)
- data Utf8State
- startUtf8State :: Utf8State
- data StrictBuilder
- strictBuilderToText :: StrictBuilder -> Text
- textToStrictBuilder :: Text -> StrictBuilder
- skipIncomplete :: OnDecodeError -> String -> Utf8State -> StrictBuilder
- getCompleteLen :: Utf8State -> Int
- getPartialUtf8 :: Utf8State -> ByteString
Documentation
validateUtf8Chunk :: ByteString -> (Int, Maybe Utf8State) Source #
Validate a ByteString as UTF-8-encoded text. To be continued using validateUtf8More.
See also validateUtf8More for details on the result of this function.
validateUtf8Chunk=validateUtf8MorestartUtf8State
Properties
Given:
validateUtf8Chunk chunk = (n, ms)
- The prefix is valid UTF-8. In particular, it should be accepted by this validation: - validateUtf8Chunk(- taken chunk) = (n, Just- startUtf8State)
validateUtf8More :: Utf8State -> ByteString -> (Int, Maybe Utf8State) Source #
Validate another ByteString chunk in an ongoing stream of UTF-8-encoded text.
Returns a pair:
- The first component nis the end position, relative to the current chunk, of the longest prefix of the accumulated bytestring which is valid UTF-8.nmay be negative: that happens when an incomplete code point started in a previous chunk and is not completed by the current chunk (either that code point is still incomplete, or it is broken by an invalid byte).
- The second component - msindicates the following:- if ms = Nothing, the remainder of the chunk contains an invalid byte, within four bytes from positionn;
- if ms = Just s', you can carry on validating another chunk by callingvalidateUtf8Morewith the new states'.
 
- if 
Properties
Given:
validateUtf8More s chunk = (n, ms)
- If the chunk is invalid, it cannot be extended to be valid. - ms = Nothing ==> - validateUtf8Mores (chunk- <>more) = (n, Nothing)
- Validating two chunks sequentially is the same as validating them together at once: - ms = Just s' ==> - validateUtf8Mores (chunk- <>more) =- first(- lengthchunk- +) (- validateUtf8Mores' more)
decodeUtf8Chunk :: ByteString -> (StrictBuilder, ByteString, Maybe Utf8State) Source #
Decode a chunk of UTF-8 text. To be continued with decodeUtf8More.
See decodeUtf8More for details on the result.
Properties
decodeUtf8Chunk=decodeUtf8MorestartUtf8State
Given:
decodeUtf8Chunk chunk = (builder, rest, ms)
builder is a prefix and rest is a suffix of chunk.
encodeUtf8(strictBuilderToTextbuilder)<>rest = chunk
Since: text-2.0.2
decodeUtf8More :: Utf8State -> ByteString -> (StrictBuilder, ByteString, Maybe Utf8State) Source #
Decode another chunk in an ongoing UTF-8 stream.
Returns a triple:
- A StrictBuilderfor the decoded chunk of text. You can accumulate chunks with(or output them with<>)toText.
- The undecoded remainder of the given chunk, for diagnosing errors and resuming (presumably after skipping some bytes).
- Justthe new state, or- Nothingif an invalid byte was encountered (it will be within the first 4 bytes of the undecoded remainder).
Properties
Given:
(pre, suf, ms) = decodeUtf8More s chunk
- If the output - preis nonempty (alternatively, if- length chunk > length suf)- s2b pre ` - append` suf = p2b s `- append` chunk- where - s2b = - encodeUtf8.- toTextp2b =- partUtf8ToByteString
- If the output - preis empty (alternatively, if- length chunk = length suf)- suf = chunk 
- Decoding chunks separately is equivalent to decoding their concatenation. - Given: - (pre1, suf1, Just s1) = - decodeUtf8Mores chunk1 (pre2, suf2, ms2) =- decodeUtf8Mores1 chunk2 (pre3, suf3, ms3) =- decodeUtf8Mores (chunk1 `B.append` chunk2)- we have: - s2b (pre1 - <>pre2) = s2b pre3 ms2 = ms3
decodeUtf8With1 :: OnDecodeError -> String -> ByteString -> Text Source #
Helper for decodeUtf8With.
Since: text-2.0.2
decodeUtf8With2 :: OnDecodeError -> String -> Utf8State -> ByteString -> (StrictBuilder, ByteString, Utf8State) Source #
Helper for decodeUtf8With,
 streamDecodeUtf8With, and lazy
 decodeUtf8With,
 which use an OnDecodeError to process bad bytes.
See decodeUtf8Chunk for a more flexible alternative.
Since: text-2.0.2
State of decoding a ByteString in UTF-8.
 Enables incremental decoding (validateUtf8Chunk, validateUtf8More,
 decodeUtf8Chunk, decodeUtf8More).
Since: text-2.0.2
startUtf8State :: Utf8State Source #
Initial Utf8State.
Since: text-2.0.2
data StrictBuilder Source #
A delayed representation of strict Text.
Since: text-2.0.2
Instances
| Monoid StrictBuilder Source # | |
| Defined in Data.Text.Internal.StrictBuilder Methods mempty :: StrictBuilder Source # mappend :: StrictBuilder -> StrictBuilder -> StrictBuilder Source # mconcat :: [StrictBuilder] -> StrictBuilder Source # | |
| Semigroup StrictBuilder Source # | Concatenation of  | 
| Defined in Data.Text.Internal.StrictBuilder Methods (<>) :: StrictBuilder -> StrictBuilder -> StrictBuilder Source # sconcat :: NonEmpty StrictBuilder -> StrictBuilder Source # stimes :: Integral b => b -> StrictBuilder -> StrictBuilder Source # | |
strictBuilderToText :: StrictBuilder -> Text Source #
Use StrictBuilder to build Text.
Since: text-2.0.2
textToStrictBuilder :: Text -> StrictBuilder Source #
Copy Text in a StrictBuilder
Since: text-2.0.2
Internal
skipIncomplete :: OnDecodeError -> String -> Utf8State -> StrictBuilder Source #
Call the error handler on each byte of the partial code point stored in
 Utf8State and append the results.
Exported for use in lazy decodeUtf8With.
Since: text-2.0.2
getCompleteLen :: Utf8State -> Int Source #
Exported for testing.
getPartialUtf8 :: Utf8State -> ByteString Source #
Exported for testing.