Top start off with, there's a good page here on migration that covers the basics : http://www.xsgeo.com/course/mig.htm#intro
The simple answer to your question is computer speed - migration is a relatively computationally expensive process, and while Kirchhoff pre-stack migration techniques are (mathematically) pretty old, the sheer brute computer power to make them practical on 3D datasets didn't really emerge until the first large-scale parallel Linux clusters towards the end of the 1990s. I can recall some of the run-time predicted on a 3D marine survey for PreSTM before this (using things like Crays and SGI Origins) were calculated to be close to *years* for a few thousand square kilometres. Its much faster now, but still usually 60 to 200 times slower than a post-stack migration.
Its not quite that simple, as pre-stack migration techniques generally require a fairly uniform offset distribution to be effective - each offset plane or cube is migrated separately, so large gaps become problematic; similarly its not always the best approach on rugged topography, crooked line 2D or hard-rock environments where an "old fashioned" DMO (Dip Moveout), stack and postSTM may give better results.
Post stack migration was the standard up until the early/mid 1990s, usually in conjunction with Dip Moveout (DMO), which is a partial pre-stack time migration that has a weaker sensitivity to velocity. We would always produce a "raw stack" as a product that would allow the company to remigrate the data - potentially with a post-stack depth migration - at a later stage. This was usually via Omega-X or finite difference type migrations that could manage lateral and vertical velocity variations.
At that point we started running "cheap" pre-stack time migrations - usually with a constant velocity or single velocity function for the whole line/survey. This allowed the use of faster migration routines like phase shift or phase-shift-plus-interpolation routines You then repicked velocities and stacked the data to give a partially migrated stack.
The next stage was to demigrate the data with the constant velocity and then remigrate it with the new velocity field using the finite-difference type post-stack migrations. A complex process!
Shortly after that large scale parallel processing put in an appearance, initially for pre-stack depth migrations (as these could be charged at a very high premium) and then for pre-stack time migrations as the cost of parallel computing came down.
I'd usually run a post-stack time migration on a 2D line in testing so that I have something to compare the preSTM to later on - that is to compare the PostSTM to the PreSTM, not run both migrations on the same data.
The main advantage is usually improved imaging on dipping events where the NMO correction doesn't work very well, especially steeper dipping events and at depth.
PreSDM is usually needed when there are significant refraction effects on the ray-paths as a result of high-velocity contrast structures - things like big marine canyons, salt domes, flood basalts and that kind of thing, although in general you'll get improvement anywhere where there are significant vertical and lateral velocity gradients.