Skip to content

Instantly share code, notes, and snippets.

@rampion
Created May 11, 2012 13:55
Show Gist options
  • Star 36 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save rampion/2659812 to your computer and use it in GitHub Desktop.
Save rampion/2659812 to your computer and use it in GitHub Desktop.
red-black trees in haskell, using GADTs and Zippers
{-# LANGUAGE GADTs #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE StandaloneDeriving #-}
module RedBlackTree where
data Zero
data Succ n
type One = Succ Zero
data Black
data Red
-- red-black trees are rooted at a black node
data RedBlackTree a = forall n. T ( Node Black n a )
deriving instance Show a => Show (RedBlackTree a)
-- all paths from a node to a leaf have exactly n black nodes
data Node c n a where
-- all leafs are black
Leaf :: Node Black One a
-- internal black nodes can have children of either color
B :: Node cL n a -> a -> Node cR n a -> Node Black (Succ n) a
-- internal red nodes can only have black children
R :: Node Black n a -> a -> Node Black n a -> Node Red n a
deriving instance Show a => Show (Node c n a)
-- one-hole context for red-black trees
data Context m c n a where
-- if we're at the root, the hole is a black node
Root :: Context n Black n a
-- we can go left or right from a red node hole, creating a hole for a black node
BC :: Bool -> a -> Node Black n a -> Context m Red n a -> Context m Black n a
-- we can go left or right from a black node hole, creating a hole for either
EC :: Bool -> a -> Node cY n a -> Context m Black (Succ n) a -> Context m cX n a
data Zipper m a = forall c n. Zipper (Node c n a) (Context m c n a)
-- create a zipper
unZip :: Node Black n a -> Zipper n a
unZip = flip Zipper Root
-- destroy a zipper
zipUp :: Zipper m a -> Node Black m a
zipUp (Zipper x Root) = x
zipUp (Zipper x (BC goLeft a y c)) = zipUp $ Zipper (if goLeft then R x a y else R y a x) c
zipUp (Zipper x (EC goLeft a y c)) = zipUp $ Zipper (if goLeft then B x a y else B y a x) c
-- locate the node that should contain a in the red-black tree
zipTo :: Ord a => a -> Zipper n a -> Zipper n a
zipTo _ z@(Zipper Leaf _) = z
zipTo a z@(Zipper (R l a' r) c) = case compare a a' of
EQ -> z
LT -> zipTo a $ Zipper l (BC True a' r c)
GT -> zipTo a $ Zipper r (BC False a' l c)
zipTo a z@(Zipper (B l a' r) c) = case compare a a' of
EQ -> z
LT -> zipTo a $ Zipper l (EC True a' r c)
GT -> zipTo a $ Zipper r (EC False a' l c)
-- create a red-black tree
empty :: RedBlackTree a
empty = T Leaf
-- insert a node into a red-black tree
-- (see http://en.wikipedia.org/wiki/Red%E2%80%93black_tree#Insertion)
insert :: Ord a => a -> RedBlackTree a -> RedBlackTree a
insert a t@(T root) = case zipTo a (unZip root) of
-- find matching leaf and replace with red node (pointing to two leaves)
Zipper Leaf c -> insertAt (R Leaf a Leaf) c
-- if it's already in the tree, there's no need to modify it
_ -> t
insertAt :: Node Red n a -> Context m c n a -> RedBlackTree a
-- 1) new node is root => paint it black and done
insertAt (R l a r) Root = T $ B l a r
-- 2) new node's parent is black => done
insertAt x (EC b a y c) = T . zipUp $ Zipper x (EC b a y c)
-- 3) uncle is red => paint parent/uncle black, g'parent red. recurse on g'parent
insertAt x (BC pb pa py (EC gb ga (R ul ua ur) gc)) = insertAt g gc
where p = if pb then B x pa py else B py pa x
u = B ul ua ur
g = if gb then R p ga u else R u ga p
-- 4) node is between parent and g'parent => inner rotation
insertAt (R l a r) (BC False pa py pc@(EC True _ _ _)) = insertAt (R py pa l) (BC True a r pc)
insertAt (R l a r) (BC True pa py pc@(EC False _ _ _)) = insertAt (R r pa py) (BC False a l pc)
-- 5) otherwise => outer rotation
-- XXX: GHC seems unable to infer that gy is Black so I have to do both cases
-- explicitly, rather than
-- insertAt x (BC True pa py (EC True ga gy gc)) =
-- T . zipUp $ Zipper (B x pa $ R py ga gy) gc
-- insertAt x (BC False pa py (EC False ga gy gc)) =
-- T . zipUp $ Zipper (B (R gy ga py) pa x) gc
insertAt x (BC True pa py (EC True ga gy@Leaf gc)) =
T . zipUp $ Zipper (B x pa $ R py ga gy) gc
insertAt x (BC True pa py (EC True ga gy@(B _ _ _) gc)) =
T . zipUp $ Zipper (B x pa $ R py ga gy) gc
insertAt x (BC False pa py (EC False ga gy@Leaf gc)) =
T . zipUp $ Zipper (B (R gy ga py) pa x) gc
insertAt x (BC False pa py (EC False ga gy@(B _ _ _) gc)) =
T . zipUp $ Zipper (B (R gy ga py) pa x) gc
-- can't derive, since we abstract over n, so we have to manually
-- check for identical structure
instance Eq a => Eq (RedBlackTree a) where
T Leaf == T Leaf = True
T (B l@(B _ _ _) a r@(B _ _ _)) == T (B l'@(B _ _ _) a' r'@(B _ _ _)) =
a == a' && T l == T l' && T r == T r'
T (B (R ll la lr) a r@(B _ _ _)) == T (B (R ll' la' lr') a' r'@(B _ _ _)) =
a == a' && la == la' &&
T ll == T ll' && T lr == T lr' && T r == T r'
T (B l@(B _ _ _) a r@(R rl ra rr)) == T (B l'@(B _ _ _) a' r'@(R rl' ra' rr')) =
a == a' && ra == ra' &&
T l == T l' && T rl == T rl' && T rr == T rr'
T (B (R ll la lr) a r@(R rl ra rr)) == T (B (R ll' la' lr') a' r'@(R rl' ra' rr')) =
a == a' && la == la' && ra == ra' &&
T ll == T ll' && T lr == T lr' && T rl == T rl' && T rr == T rr'
_ == _ = False
-- can't derive, since B abstracts over child node colors, so
-- manually check for identical structure
instance Eq a => Eq (Node c n a) where
Leaf == Leaf = True
R l a r == R l' a' r' = a == a' && l == l' && r == r'
b@(B _ _ _) == b'@(B _ _ _) = T b == T b'
_ == _ = False
@michaelt
Copy link

Just for the heck of it I tried it with DataKinds, to make it use the ordinary data Nat = ... and a data RedBlack = Red | Black. It worked fine as you no doubt would have expected, only a couple of signatures needed changing. What most surprised me is that the deriving still worked. You can add deriving Show ... for Context and Zipper, by the way, and then zip around in ghci just fine.

@gasche
Copy link

gasche commented May 12, 2012

There is T . zipUp $ redundancy in your insertAt function. It would be nicer, I think, if it returned a Zipper instead:

  • less redudant code
  • a clearer signature: you take an almost-zipper with a potentially-wrong color on the node, and re-balance it into a correct zipper
  • while insert would zipUp the returned zipper immediately, there might be use cases where this zipper has value; I'm wondering for example if code to insert a list of (key,value) pair in one batch could not reuse the resulting context

@rampion
Copy link
Author

rampion commented May 12, 2012

I can think of two problems with having insertAt return a Zipper.

  1. User confusion with the Zipper location. If insertAt recurses, the final Zipper may not point to the same position in the tree as before the insertion - it could move up from a leaf all the way to the root. I'd put this down as surprising behaviour for a Zipper, so I'd want to code a return to the original location.
  2. Type. In case 1 we paint the red node black and insert it at the root. With a zipper, this would have a signature of Node Red n a -> Context n Black n a -> Zipper (Succ n) a since this adds one black node to every path from the root to a leaf. In case 2 however, there tree's number of black nodes remains the same, so that would have a signature of Node Red n a -> Context m c n a -> Zipper m a. So we'd actually need to return Either (Zipper (Succ m) a) (Zipper m a).

Although I suppose I could alter RedBlackTree a to contain a Zipper m a rather than a Node Black m a, and update insert to make sure we don't need to check upwards.

@simonpj
Copy link

simonpj commented May 17, 2012

Nice. I've taken a copy for GHC's regression test suite (with a link back here), so that we'll know if this program ever fails to compile. Please yell if you don't want me to do that and I'll remove it again.

Simon

@bluescreen303
Copy link

Does anyone have an explanation for why ghc isn't able to infer gy is Black? (line 87)
I tried using DataKinds and KindSignatures, which works fine, but doesn't seem to help infering this.

As types are closed (you cannot add new constructors), I would expect it's possible to create a mapping from types to and from constructors.

@simonpj
Copy link

simonpj commented May 17, 2012

Well, gy :: Node cY n a
and cY is existentially bound with absolutely no other static information. In the code that works you pattern-match explicilty on Leaf and B, but you omit a match on R. So, why do you think R is impossible here? You need to explain why in the types!

I think you believe that cY must be Black. But why?

@rampion
Copy link
Author

rampion commented May 17, 2012

I believe cY must be Black because if it were Red it must have been constructed using R, which would have been covered by an earlier pattern match:

-- 3) uncle is red => paint parent/uncle black, g'parent red.  recurse on g'parent
insertAt x (BC pb pa py (EC gb ga (R ul ua ur) gc)) = --...

So it looks to me like I'm failing to prove that given gy :: Node cY n a and gy does not match R _ __ _, that cY is Black despite the fact that only other two constructors constrain cY to be Black.

Which I really think of as a failure on my part to understand GHC's type inference.

@bluescreen303
Copy link

When I comment out the cases at line 93-100 and run with -Wall:

    Warning: Pattern match(es) are non-exhaustive
             In an equation for `insertAt':
                 Patterns not matched:
                     (R _ _ _) (BC False _ _ (EC False _ Leaf _))
                     (R _ _ _) (BC False _ _ (EC False _ (B _ _ _) _))
                     (R _ _ _) (BC True _ _ (EC True _ Leaf _))
                     (R _ _ _) (BC True _ _ (EC True _ (B _ _ _) _))

While I don't know a lot about ghc's type inference, this proves to me that some part of GHC is able to figure out that "Leaf" and "B" are the only 2 constructors possible in this spot. So when I write a _ there, it can be inferred that whatever is matched there is always gonna be Black.

@simonpj
Copy link

simonpj commented May 18, 2012

Consider this

data T a where
TX :: T Int
TY :: T Bool
TZ :: T Bool

f :: a -> T a -> a
f v TX = v+1
f v _ = not v

Should this type check. You argue: sinc you have matched TX,
'a' must be 'Bool'. Hence (not v) should be fine.

But this does not work at all. For a start the equations are
treated separately by the type checker. And, more important,
there is not translation into our internal langauage FC. The
argument relies on the fact that other alternatives are eliminated
so the only remaining alternative is that aBool. But there is
no positive evidence that a
Bool.

So we have to write

f :: a -> T a -> Bool
f v TX = v > 1
f v TY = not v
f v TZ = not v

If we translate into FC we get:

f = /\a. (v::a) (t::T a).
case t of
TX (c :: aInt) -> (v |> c) > 1
TY (c :: a
Bool) -> not (v |> c)
TZ (c :: a~Bool) -> not (v |> c)

Notice that the proof, c, that (a~Bool) is used in the RHS
to cast 'v' from 'a' to Bool.

Sorry!

@rampion
Copy link
Author

rampion commented May 18, 2012

Thanks for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment