The language compiled by phc
****************************

The original code for phc was the hbcc compiler, an abortive
Haskell-in-Haskell compiler written by Lennart Augustsson between 1993
and 1995 (or thereabouts).  It therefore compiles the Haskell language
as it existed back then (Haskell 1.3ish).  There are various
forward-looking additions to the compiler, and there have been
syntactic tweaks over the years.  This document outlines what to
expect.  

****************
Missing syntax
****************

There are two important pieces of missing syntax: Qualified names (and
qualified imports), and records with named fields.  Import lines look
something like these:

  import List hiding (nub, nubBy)

  import List(sort, group)

Export lines use the old export syntax.  In particular, you can't
refer to e.g. "module List" in your export line.

If you use qualified imports due to lots of overlapping names, you may
be out of luck porting your programs to Eager Haskell for the moment.

Named records are easier to work around, but may still require you to
rewrite a lot of your pattern matching code if you use record matching
extensively.  The Haskell 98 report gives a mapping.

****************
Mangled syntax
****************

* Pragmas

Eager Haskell has a number of pragmas, most of them for interface
files.  The two useful pragmas in user code are INLINE and NOINLINE.
The former tells the compiler that inlining particular functions is a
good idea.  The latter tells the compiler that it's probably a
terrible idea.

* newtype and strict fields: a caveat.

The newtype construct exists and acts as expected.  Alas, the
following two declarations do exactly the same thing:

data Foo = Foo !Int

newtype Foo = Foo Int

So data types with a single constructor, whose constructor has a
single strict field, also behave like newtypes.  Most of the time this
won't make any apparent difference, but there are subtle semantic
differences having to do with when the Int is required to be defined:

  checkfoo (Foo _) = True

checkfoo (Foo undefined) should be "undefined" for the strict-field
type and "True" for the newtype (right?).  It will always be True in
Eager Haskell.

* existential types

Our compiler has an early and experimental implementation of
existential types.  Unlike more recent versions, the quantification of
type variables is *NOT* made explicit.  Thus:

data AFoo = ABar a (a -> Int) (a -> Bool)

defines an existential type constructor ABar:

ABar :: a -> (a -> Int) -> (a -> Bool) -> AFoo

* explicit constructor types

An explicit type can be given to a constructor, something like this:

data Array b e = Array  b b (Vector e) :: Array b e
               | Array2 b1 b2 b1 b2 (Vector e) :: Array (b1,b2) e

This was an experiment ages ago and may let you do terrible things.
You're on your own if you play with this.

* multiparameter type classes

Based on the early gofer implementation of the same idea.  There are
no functional dependencies, and things may thus sometimes fail to type
for reasons which are not immediately apparent.  IMPORTANT: without
functional dependencies, every method type must mention all the types in the
class declaration.

class Indexable c e where
  (!) :: (Ix a) => c a -> e

* pH loop syntax

There is a flag -floop-syntax to allow pH-style "for" and "while"
loops (which are sugar for tail recursion).  See "Implicit Parallel
Programming in pH" by R.S. Nikhil and Arvind (Morgan-Kaufman, 2001) if
you want to learn more.

* pH barrier syntax

There is also a flag -fpar-seq which permits pH-style binding syntax.
This allows the use of bare expessions in lieu of bindings (useless in
the absence of side effects), allows bindings to be grouped using
parentheses (which introduces a new indentation context), and permits
the use of barriers (>>>), which strongly sequence execution.  See
"Implicit Parallel Programming in pH" by R.S. Nikhil and Arvind
(Morgan-Kaufman, 2001) if you want to learn more about barriers.  The
use of -fpar-seq disables many code motion optimizations which would
be unsafe in the presence of barriers or side effecting computations.

Here's an example and its parse, to give you an idea of what's going
on:

g x =
  let ( b = a + 2
	>>>
        a * b + b
	>>>
	(   a + b *
	    c + d)
	(   a + b     -- AMBIGUOUS.  Parsed as two expressions.
	    c + d )
	(   a + b
	  * c + d )
	(   a + b 
	  ) * (c + d)
	c = b - a
	>>>
	(   d = c * b
	    e = c `quot` b)
      )
      a = x + 5
  in d * e

g x =
  let { ( b = a + 2 
          >>>
	  a * b + b
	  >>>
          ( (a + b * c + d);
	    ( a + b;
              c + d );
            (a + b * c + d);
            (a + b) * (c + d);
	    c = b - a
	  )
	  >>>
          ( d = c * b;
            e = c `quot` b)
	);
        a = x + 5 
  } in  d * e

A few things to notice.  First, ";" and indentation bind more tightly
than ">>>".  This means that a binding like this:
  let a
      b
      >>>
      c
is parsed as (a;b)>>>c which may not be what you want.  Use
parentheses to disambiguate.  In the above example, the barrier region
is enclosed in parentheses and indented to set it apart from the
binding for "a".  The tight binding of ";" allows us to write
function definitions in a natural manner in barrier regions:
  a = e
  >>>
  f 0 = a
  f n = f (n - 1) + a
  u = f 17

Notice that the mixture of expressions-as-bindings and
parentheses-group-bindings can lead to confusion.  The compiler
resolves this ambiguity as follows:
- (expr) is parsed as an expr, not a binding in parentheses which
  happens to be an expression.  This should not make a difference in
  practice, but is noted in case you're hunting an ambiguity.
- the parenthesized region marked "AMBIGUOUS" above is parsed as two
  standalone expression bindings (a + b ; c + d) rather than as a single
  badly-indented expression (a + (b c) + d)
These two rules together allow the compiler to handle the various
screw cases shown above with a modicum of grace.

* unboxed types

There's support in there, sorta, but it doesn't do anything
meaningful.  All occurrences of supposedly "unboxed" values in the
Prelude code are a massive sham, as numerous declarations of the
following form make clear:

> type Int# = Int

Don't muck with unboxed types unless you know precisely what you're
doing.  

****************
Prelude
****************

Prelude functions you really want to avoid in an eager language:

enumFrom  [n..]
enumFromThen [n,m..]
iterate

The prelude should match the one for Haskell 98, with a few additions:

> reduce :: (a -> a -> a) -> a -> [a] -> a

This is like "foldl" and "foldr", except the function must be
associative and the provided value must be an identity of the
function.  It allows the deforestation pass of the compiler to
generate somewhat more efficient and/or more parallel code for
reductions.  

> reduce1 :: (a -> a -> a) -> [a] -> a

A *reduce* without an identity.

> someOrder :: [a] -> [a]

Declares that the order of elements in a list does not matter.  This
is most likely because it is being reduced with a commutative
function, or because it is being converted into a set, a bag, an
array, or the like.

> iteraten :: Int -> (a -> a) -> a -> [a]

iterate n f i = take n (iterate f i)

Only it actually behaves in a halfway reasonable fashion.

> strictList :: [a] -> [a]

Makes sure all elements of the list are in WHNF before returning.

> unfold :: (a -> Bool) -> (a -> (b,a)) -> a -> [b]

Construct a list such that it can be deforested.  Otherwise would be
defined like so:

unfold p g a 
  | p a = []
  | otherwise = e : unfold p g b
    where (e, b)

There is an additional method in class Eq: _fastEqIsSafe.  This
returns True when pointer equality implies object equality.  By
default it is False.  It must never actually examine its argument,
though definitions typically use lazy pattern matching.  Right now
only prelude types define _fastEqIsSafe; there are peculiar corner
conditions for mutually-recursive types which make deriving a
definition within the compiler rather tricky.  In practice, any data
type which doesn't perversely make (x==x) False can define
_fastEqIsSafe _ = True and start the definition for equality with:
  x == y | _fastEq x y = True

The IO type is defined in terms of an ST type (which is almost but not
quite the usual ST type, as there's no universal quantification over
the state).  This isn't Haskell-98 compliant, but will mostly work
unless you're importing GHC-specific libraries or define a "ST" monad
which just carries state around.

Note that prelude names starting with _underscore are used internally
by the compiler.  If they're not exported by PreludeCore bad things
may happen.

If there are other differences between preludes, they're probably
bugs.  Let us know.

****************
Standard Libraries
****************

* Unchanged libraries (we hope):

Char
Complex
Ratio

* List

> numberListFrom :: Int -> [a] -> [(Int, a)]
> genericNumberListFrom :: (Enum i) => i -> [a] -> [(i,a)]

numberListFrom n xs = zip [n..] xs

Except the performance will be halfway reasonable.  Better, in fact,
than the (roughly sort of) equivalent:

zip [n..length xs] xs

numberListFrom still actually works on infinite lists, too, whereas
the above does not.

> mergeBy :: (a -> a -> Ordering) -> [a] -> [a] -> [a]

merge two lists which are presumed to be sorted in ascending order,
yielding a list in ascending order.  This is reimplemented in half the
Haskell programs I have ever examined, thus its presence in List.
Alas, for the same reason there is no "merge" function as it would
conflict with all of those definitions.

* Numeric

Is probably really flaky.  It will get attention when we have time and
need.  Sorry.

* Ix

_rangeSizeUnchecked is provided as an additional class method.  It
does not check that the lower bound lies below the upper bound.  It's
used for efficient array operations when inRange is known.

We don't assume any of the following from the report:

   range (l,u) !! index (l,u) i == i   -- when i is in range
   inRange (l,u) i == i `elem` range (l,u)
   map index (range (l,u))      == [0..rangeSize (l,u)]

Instead, we require that:
   0 <= index (l,u) i < rangeSize (l,u)  when inRange (l,u) i

And in the absence of any explicit definition to the contrary:
   index (l,u) u == rangeSize (l,u) - 1

This allows the programmer to experiment with Ix instances which
implement blocking and various other non-linear array traversals.

* Array

The arrays in Eager Haskell are *much* more flexible and richly
defined than the ones in ordinary Haskell.  In an array comprehension
(call to "array", "listArray", or "(//)"), we process the arguments IN
ANY ORDER, and any index may depend on the value of another array
element as long as no cyclic dependencies exist.  Even if cycles do
exist, elements outside the cycle are properly defined.

Contrast this with Haskell, where all the indices in a comprehension are
evaluated before the result becomes available.

The Show method of Array prints only the elements of the array which
are defined by (index,value) pairs when the array was created.  For
example:

show (array (0,9) [(1,1), (2,3)]) 
  "array (0,9) [(1,1),(2,3)]"

show (array (0,9) [(1,1), (2,undefined)])
  "array (0,9) [(1,1),(2,
  Error: undefined value accessed.

+ The Imperative class

Elements of arrays must be members of the Imperative class.  Every
data type automatically derives Imperative, so this isn't a challenge
in practice.  It limits the use of polymorphism with mutable arrays.
The presence of "Imperative" annotations in the Array library is
mostly a side effect of the use of imperative arrays to implement
purely-functional arrays.

* IO

Lots of functionality is not implemented or is poorly implemented.
Don't expect most of the is...Error functions to work, for example.
We detect *that* errors occur most of the time, but we don't do much
to figure out *which* errors have occurred.

* System

Surprisingly, System *is* fully implemented.  It also defines:

> ProgramArgs :: [String]

Which is a top-level constant equal to the arguments passed to the
program when it was started.

* Other libraries

Not implemented yet.  

****************
Non-standard libraries
****************

* ArrayCore

Access to the guts of arrays, most interestingly:

> dassocs :: (Ix a) => Array a b -> [(a, b)]

Returns the defined elements of the array after the comprehension
constructing the array has been fully consumed.  Used to define the
Show instance for Array.

* Effect

Very low-level routines for control over execution.

> ctWHNF :: a -> Bool

True if its argument is WHNF at compile time.  Otherwise False.
Not necessarily accurate; will sometimes be "False" when it could have
been "True".  Will reliably be eliminated at compile time (or the
resulting code won't link!).

> thunk :: (a -> b) -> a -> b

Lazy function application.  Lots of caveats to this one, chief among
them being that you should apply thunk to exactly as many arguments as
you intend to suspend, and should then not immediately use the result
in a strict context.  Useful if you know what you're doing, but the
compiler isn't quite robust enough yet for those circumstances to be
easy to explain.  Should not in any case be actively dangerous, it
just may make program performance worse rather than better if used poorly.

> (&&&) :: Bool -> Bool -> Bool

Parallel "and".  If either operand is "False", will return "False"
without *necessarily* fully evaluating the other one.  Thus:

undefined &&& False     => False
False     &&& undefined => False
undefined &&& undefined => undefined
False     &&& (unsafePerformIO ...) => who knows?

> (|||) :: Bool -> Bool -> Bool

Parallel "or".  See parallel "and".

> firstExp :: a -> a -> a

Returns whichever of its arguments is first noticed to be in WHNF.
This *might* still cause execution of the other argument.

> lastExp :: a -> a -> a

Returns whichever of its arguments is *last* noticed to be in WHNF.
This can be used a bit like "seq" in some code.  However, it always
discards the first argument to reach WHNF, and can thus have better
space performance than "seq" in some cases.

> firstLast :: a -> a -> (a,a)
> firstLast a b = (firstExp a b, lastExp a b)

> compOrder :: [a] -> [a]

Return the list in computation order.  Possibly very handy for the
parallel search nuts.  I don't know, give it a try.  It's a tricky
routine to write in any case, so we provide a canned version.

The canned version is rather untested at the moment.  Let me know if
it works.

> andAlso :: a -> b -> a

Return the first argument.  Make sure at least an attempt is made to
evaluate the second argument.  Not terribly useful unless you know
exactly what you're doing.

> nseq :: () -> b -> b

A special seq operator for the void type.

> alsoAll :: [()] -> ()
> alsoAll xs = reduce andAlso () (someOrder xs)

> nseqs :: [()] -> ()
> nseqs xs = reduce lastExp () (someOrder xs)

Note: does *not* use nseq!  This implementation is more
space-efficient.  

> seqs :: a -> [a] -> a
> seqs a xs = a `lastExp` reduce lastExp a (someOrder xs)

The first value will be the result if the list is empty.  Otherwise
the last value computed/seen will be returned.  Don't rely on this
choice to make particular rational sense in practice.

* ExternObj

Not fully fleshed out yet.

> newtype Ptr a

A raw pointer to objects of type a.

> class Storable a where
>   sizeof :: a -> Int
>   store  :: Ptr a -> Int -> a -> ST s ()
>   load   :: Ptr a -> Int -> ST s a

sizeof gives the size of a raw object a in bytes.
store stores an object at an offset from a Ptr.
load loads an object at an offset from a Ptr.

> data Addr
> fromAddr :: (Storable a) => Addr -> Ptr a
> toAddr :: (Storable a) => Ptr a -> Addr

Raw addresses.  Note you can't do much with them unless you're a
compiler/prelude hacker.

* ICell

I-structure cells a la pH.  You had better know what you're doing if
you don't want to lose side effects here.  You've been warned.

> data ICell a = ICell a

A cell to eventually contain objects of type a.  The "ICell" can be
pattern matched against, but should not be used as a constructor.

> emptyICell :: (Imperative a) => () -> ICell a

Create a fresh, empty ICell.

> iCell :: a -> ICell a

Creates a full ICell directly.

> iStore :: ICell a -> a -> a

Store a value in an empty ICell.  Error if the cell is already full.

> iFetch :: ICell a -> a

Fetch the contents of an ICell.

And a few non-standard extensions:

> iFull :: ICell a -> Bool

Test whether the iCell is full at the moment.

> iUnify :: ICell a -> ICell a -> ICell a

"unify" two icells: if one is full, its value is written in the
other.  

* PHArray

Mutable PH-style arrays.  IArray is an I-structure array (write-once),
MArray is an M-structure array (freely mutable, with empty and full
state for each element).

> data Array i a
> data IArray i a
> data MArray i a

iArray, mArray work like array.

iBounds, mBounds work like bounds.

(!.) works like (!) except on IArrays.

(!^) is "take"---it empties an element of an MArray.

ma !^ i => take the value from index i of ma and return it.

(!^^) is "examine", which works like (!).

> iAStore 	:: (Ix a) => IArray a b -> a -> b -> IArray a b

Store a value into IArray.  Error if element is already full.

> mAStore 	:: (Ix a, Imperative b) => MArray a b -> a -> b -> MArray a b

Block until MArray element is empty, then fill it.

> mAReplace 	:: (Ix a, Imperative b) => MArray a b -> a -> b -> MArray a b

Block until MArray element is full, then replace the value.

> mArrayInit      :: (Ix a, Imperative b) => b -> (a,a) -> MArray a b

Create an MArray with all elements initialized to a single value.

> (//^=) :: (Ix a, Imperative b) => MArray a b -> [(a, b)] -> MArray a b

Store (mAStore) elements into an extant MArray, in any order.

> (//^^=) :: (Ix a, Imperative b) => MArray a b -> [(a, b)] -> MArray a b

Replace (mAReplace) elements in an extant MArray, in any order.

> mAccum :: (Ix a) => (b -> c -> b) -> MArray a b -> [(a,c)] -> MArray a b

Accumulate into an MArray.  RHS may be accumulated in any order we
want.  The MArray is updated in place.

> iToArray        :: (Ix a) => IArray a b -> Array a b

Convert to a functional array, in place.  Future updates to the IArray
will still be visible in the functional array.

> mToArray        :: (Ix a, Imperative b) => MArray a b -> Array a b

Convert in place.  If you perform later take operations on the MArray,
you're out of luck.

> mToIArray       :: (Ix a, Imperative b) => MArray a b -> IArray a b

Convert in place.  If you perform later take operations on the MArray,
you're out of luck.

> arrayToM        :: (Ix a, Imperative b) =>  Array a b -> MArray a b

*Copies* the array to produce a new mutable array.

* PackedString

Basically the same as GHC's PackedString library, except the internals
are of course entirely different.

This includes:

psToCString           :: PackedString -> Addr
toCString             :: String -> Addr

Which are used for converting strings for primitives like getEnv and
openFile.