iteratee Iteratee breaks monadic laws
John Lato
jwlato at gmail.com
Mon Feb 20 13:36:37 GMT 2012
Hi Michael,
A small update on this issue, and putting some information on the list
for posterity.
"It's not a bug, this behavior is by design".
Currently, as a general rule enumeratees do not pass EOF to their
inner iteratees. This is to allow for compositions like the
following:
compile :: StreamExpr -> Enumerator Vec m a
compile (FileSource stuff) i = tryRun =<< enumFile (fp stuff) (I.drop
(offset stuff) >> I.takeUpTo (count stuff) i)
compile (Seq exprs) i = foldr ((>=>) . compile) enumEof exprs i
If enumeratees passed along terminators, then the inner iteratee in
the "FileSource" line couldn't be retrieved, as it would always be in
a Done state after it's extracted from I.takeUpTo.
For what you want to do, though, the enumeratee must pass along EOF in
order to signal that state to the consumer, and then return an outer
iteratee in either the done or cont state depending on the inner
iteratee's behavior.
I think the best approach is to duplicate all provided enumeratees so
that both semantics are available. Fortunately the required changes
are pretty minimal. I've re-implemented the enumeratees from
Data.Iteratee.Iteratee, and the new "mapChunks" seems to behave as
required. Many core functions, such as eneeCheckIfDone* and joinI,
should be unaffected.
I'll see if I can get a patch into the repo today, and if it works for
you push it to hackage later.
Best,
John
On Thu, Feb 16, 2012 at 1:35 AM, Michael Baikov <manpacket at gmail.com> wrote:
> Hello John.
>
>> This is a reasonable thing to want to do. When I've had to deal with
>> this myself, I've essentially worked around it by not advancing the
>> stream when I know I'm at the last mark. This was possible for the
>> file format I was working with, but it may not always be the case.
>
> It's not possible in general case. And in my case it's even more complicated -
> file gets appended by separate process and I want to be able to follow it's
> tail.
>
>
>>> I know, this is a bad example, but only for simplicity sake.
>>> First two options dies after 10 loops, as expected, third one waits
>>> forever because of throwErr in joinI.
>>
>> If I don't change the second law, then it's your "consumer" that's broken :)
>
> True, but I can change EOF Nothing with EOF EndOfFileReachedDoSeekNow.
> This extra field inside EOF constructor was meant to be used to send controll
> messages from Enumerator to Iteratees/Enumeratees. This is mentioned in Oleg's
> original paper, he also mentioned this in our conversation about typesafe
> exception passing. So it there is anything broken - it's iteratee
> implementation :)
>
>
>
>>>> Both of these do print "[0,1,2]". However if your iteratee starts
>>>> injecting data into a stream it will alter the stream length, which
>>>> will cause issues with things like "I.take" and "countConsumed".
>>>> Which is another reason for the first law.
>
> Those are two different streams. Those functions are "take from outer stream"
> and "count consumed from outer stream", and not "feed into consumer" and
> "count consumed by ultimate consumer". I still can have a transformer which
> will break [Word64] into much larger [Word8] or even [Bit].
>
>
>
> On Thu, Feb 16, 2012 at 12:45 AM, John Lato <jwlato at gmail.com> wrote:
>> Hi Michael,
>>
>> Thanks very much for your reports, and this discussion. It's been
>> very useful to me.
>>
>> On Wed, Feb 15, 2012 at 11:14 AM, Michael Baikov <manpacket at gmail.com> wrote:
>>> Hello John.
>>>
>>> I see. That's a very interesting second law, and it will be part of my
>>> next bug report :)
>>>
>>> "All iteratees must enter the done state (i.e. call the first
>>> continuation) upon receipt of EOF."
>>> This must be extended to something like "must enter the done state or
>>> emit an exception to
>>> perform seek or anything else to change this EOF state"
>>
>> I take your point here.
>>
>>>
>>> Suppose you have a file and you want to do seeking. And you want to
>>> seek only for specific marks.
>>> So first you read whole file, remembering all marks location and then
>>> after getting EOF just seek for first such mark and do some magical
>>> stream processing.
>>
>> This is a reasonable thing to want to do. When I've had to deal with
>> this myself, I've essentially worked around it by not advancing the
>> stream when I know I'm at the last mark. This was possible for the
>> file format I was working with, but it may not always be the case.
>>
>>>
>>>
>>>
>>> I know, this is a bad example, but only for simplicity sake.
>>> First two options dies after 10 loops, as expected, third one waits
>>> forever because of throwErr in joinI.
>>
>> If I don't change the second law, then it's your "consumer" that's broken :)
>>
>> But in all seriousness, what would be the desired semantics of joinI
>> if the second law is changed as you suggest? If we take
>>
>>> joinI outer = outer >>= lift . run
>>
>> as a semantic definition, the behavior is identical to the current
>> joinI up to bottom (e.g. this throws an exception, the current is
>> non-terminating).
>>
>> Or from another perspective, how exactly is the continuation meant to
>> be constructed in the general case? Since "joinI :: Iteratee s m
>> (Iteratee s' m a) -> Iteratee s m a", where are the required "s' "s
>> meant to come from? It's not a problem with Control.Monad.join,
>> because that function has both a different type and different
>> semantics (even when the types do match).
>>
>> In short, although I do think it's reasonable to want to change my
>> second law as you suggest, I'm not certain how to do so sensibly. The
>> second law essentially provides that "run :: Iteratee s m a -> m a" is
>> total for all iteratees that obey it. Your consumer causes "run" to
>> diverge. That makes joinI divergent in turn, by the meaning of joinI
>> which explicitly terminates the inner iteratee with "run". Of course
>> (like many laws) it can be usefully violated, but only if you avoid
>> the cases that the law is meant to protect against. Or if you are
>> exceedingly clever.
>>
>> I can make joinI friendlier by having it propagate an exception
>> instead of non-terminating or calling error, but for now I think
>> that's the extent of what's possible.
>>
>>>
>>> import Data.Iteratee.Iteratee
>>> import Control.Monad
>>> import Data.Iteratee.IO
>>> import Control.Monad.IO.Class
>>> import qualified Data.Iteratee as I
>>>
>>> enumFileForever :: String -> Enumerator String IO a
>>> enumFileForever fname iter = enumFileRandom 10 fname iter >>= enumEof
>>>>>= enumFileForever fname
>>>
>>> consumer :: Iteratee String IO ()
>>> consumer = liftI (go 0)
>>> where
>>> go c (Chunk s) = liftIO (putStr s) >> liftI (go c)
>>> go 10 e = error "I did 10 loops, let's die"
>>> go n e = I.seek 0 >> liftI (go (n+1))
>>>
>>> main = do
>>> -- enumFileForever "foo" consumer >>= run
>>> -- enumFileForever "foo" (join $ (mapChunks id) $ consumer) >>= run
>>> -- enumFileForever "foo" (joinI $ (mapChunks id) $ consumer) >>= run
>>>
>>>
>>>
>>> PS: There is something wrong with iteratee maillist against. Feb is missing.
>>> http://projects.haskell.org/pipermail/iteratee
>>
>> I should probably look for a new host. It's disappointing, but
>> projects.haskell.org has been problematic.
>>
>> John
>>
>>
>>>
>>>
>>> On Wed, Feb 15, 2012 at 6:12 PM, John Lato <jwlato at gmail.com> wrote:
>>>> Hi Michael,
>>>>
>>>> Yes, this is unfortunate. Iteratees that attempt to inject data into
>>>> a stream are unsound, for exactly this reason. Currently, I am aware
>>>> of two laws that well-formed iteratees should follow:
>>>>
>>>> 1. Any data returned to a stream (via the second parameter of
>>>> "idone") should be a subset of data received from the stream.
>>>> 2. All iteratees must enter the done state (i.e. call the first
>>>> continuation) upon receipt of EOF.
>>>>
>>>> The first is (in part) because of this problem; the second (together
>>>> with the first) guarantees a result upon termination.
>>>>
>>>> The first law may in fact be too strict, at least from the perspective
>>>> of monad laws. It's probably possible to write a safe version of
>>>> "inject" provided that you first take some stream data, e.g.
>>>>
>>>> inject :: Int -> Iteratee [Int] IO ()
>>>> inject x = liftI step
>>>> where
>>>> step (Chunk xs) = idone () (Chunk (x:xs))
>>>> step str = idone () str
>>>>
>>>> Of course this won't actually inject values unless you apply it to an
>>>> enumerator:
>>>>
>>>> main = do
>>>> (enumPure1Chunk [] test') >>= run >>= print
>>>> (enumPure1Chunk [] test'') >>= run >>= print
>>>>
>>>> Both of these do print "[0,1,2]". However if your iteratee starts
>>>> injecting data into a stream it will alter the stream length, which
>>>> will cause issues with things like "I.take" and "countConsumed".
>>>> Which is another reason for the first law.
>>>>
>>>> The monadic law violation is a consequence of the current design, in
>>>> which remainder stream data is carried along with a final result. In
>>>> iteratee-0.9, I'm changing the definition of iteratee to something
>>>> closer to (haven't quite decided yet):
>>>>
>>>> newtype Iteratee s m a = Iteratee{ runIter :: forall r.
>>>> (a -> r) ->
>>>> ((Stream s -> m (Iteratee s m a, Stream s)) -> r) ->
>>>> (Iteratee s m a -> SomeException -> r) ->
>>>> (forall b. m b -> (b -> (Iteratee s m a)) -> r) ->
>>>> r}
>>>>
>>>> This should fix the inject problem, although injecting elements will
>>>> still be unsound WRT I.take and similar.
>>>>
>>>> John L.
>>>>
>>>> On Wed, Feb 15, 2012 at 4:47 AM, Michael Baikov <manpacket at gmail.com> wrote:
>>>>> There are three known monadic laws which every monad must obey:
>>>>>
>>>>> "Left identity": return a >>= f ≡ f a
>>>>> "Right identity": m >>= return ≡ m
>>>>> "Associativity": (m >>= f) >>= g ≡ m >>= (\x -> f x >>= g)
>>>>>
>>>>> Iteratee breaks the third one: "do { a; b; c }" gets desugared to this:
>>>>> "a >> (b >> c)" plain usage of "a >> b >> c" is the same as "(a >> b) >> c",
>>>>> since (>>) is defined with infixl.
>>>>>
>>>>> Here is a simple example which works with do notation, but fails to
>>>>> work with (>>)
>>>>>
>>>>> import qualified Data.Iteratee as I
>>>>> import Data.Iteratee.Iteratee
>>>>>
>>>>> inject :: Int -> Iteratee [Int] IO ()
>>>>> inject x = idone () (Chunk [x])
>>>>>
>>>>> test' :: Iteratee [Int] IO [Int]
>>>>> test' = do
>>>>> inject 2
>>>>> inject 1
>>>>> inject 0
>>>>> I.stream2list
>>>>>
>>>>> test'' :: Iteratee [Int] IO [Int]
>>>>> test'' = inject 2 >> inject 1 >> inject 0 >> I.stream2list
>>>>>
>>>>> main = do
>>>>> run test' >>= print -- [0, 1, 2]
>>>>> run test'' >>= print -- [2]
More information about the Iteratee
mailing list