In this context, there's no real difference.
In other contexts, there is. If you see someone do something, it suggests that you saw the whole action from start to finish; if you see someone doing something, it suggests that you saw this action in progress.
For example, "I saw Mary eat that box of chocolates" means that you know that she ate all the chocolates. If you say "I saw Mary eating that box of chocolates", you only know that she ate some of them: she might not have finished the entire box.