Some Thoughts On InvestorRank

Chris Farmer, a VC with General Catalyst, presented some interesting data yesterday at Disrupt. He ranked VC firms on the basis of what companies they invested in as the first VC investor. If you invested in a highly successful company in the first round, you get "InvestorRank" and like Google Page Rank, that rank is transferable to other firms. If you follow an investor in the next round, some of your rank will transfer to the firm who led the deal before you.

This is an insightful way to look at the early stage venture capital business. The objective of early stage VC investors is to get into the best deals in the first round and then to get other high quality firms to follow on in the next rounds. That is how it was taught to me and it is how we have built the two firms I have co-founded.

I haven't studied Chris' data to have a point of view on the ranks he has calculated and the ratings he presented. But if I was investing in venture capital firms as an LP, this would be a big part of what I would look at.

Returns are important, but they are a trailing indicator. There is no guaranty that past returns will be an indicator of future returns. What is more important is the team, the strategy, and their ability to get into the right deals and build the right syndicates. InvestorRank is a good attempt to quantify that last bit.

#VC & Technology

Comments (Archived):

  1. leapy

    Like all great research, there’s plenty there to take issue with but it’s a great attempt to identify a measure of investment “influence”. It also shouldn’t be too volatile over time which lends itself to long term tracking..I also think a key indicator is the “last ten years” ranking where USV come in at #2. No pressure then! 😉

  2. Dave W Baldwin

    Congratulations.

  3. bijan

    it is an interesting take for sure. and the firms listed are excellent.  but this method doesn’t give any credit to inside rounds, fund size or returns. it also doesn’t take into account that each VC firm has multiple funds.In other words, a VC firm may get a great follow on investor a few times per fund but the rest of the portfolio could be ugly. for example, i can see at least one firm on that list that has never generated carry while a number of them have return the fund or more.

    1. ShanaC

      What do you think a truer measure of VC reach should look like then?

    2. fredwilson

      great point on inside roundswe’ve done that in a few deals and there are no follow ons from other VCs and the two i am thinking of our in our top five to ten deals 

    3. paramendra

      Good points. 

    4. Chris Farmer

      Bijan,Your points are correct. I debated whether to take into consideration inside rounds but the tend to fall into two extremes – companies that need a little extra runway and companies that investors have so much conviction about that they “double down”. As it’s difficult to tell which is which without inside information, I excluded it from the data. Re: fund size, the approach does not directly deal with fund size but, typically, larger funds make more investments so the methodology automatically adjusts accordingly. Re: multiple funds, I specifically wanted to avoid looking at the data on a fund by fund basis as they have different vintages and, therefore, it’s a bit of an apple & oranges comparison. Ultimately, over time (e.g. 7-10 years) the real data will come in and specific returns are obviously a superior way to judge at that point. I wanted to create an approach that was more forward looking than returns and this was the best I could come up with.

  4. JimHirshfield

    It is interesting, but the data set may be flawed. His source is crunch base, which is not exactly a complete picture of all vc activity. for starters, it skews towards successful and newsworthy companies. But you’re right, an interesting way to measure vc quality.Does this really matter to entrepreneurs? I don’t think so. It may correlate to most value add vc, but measure of what works best for entrepreneur is a vc you can get along with, believes in your vision, helps with strategy and connections. Yes?

    1. fredwilson

      yup  great point

    2. Chris Farmer

      Re: your first point, the base data was from CrunchBase but we augmented it with information from the VC websites (present and past – using the WayBackMachine). We also used PR re: funding rounds. We then dual processed the data with a team of 10-15 people and well over 500 man hours to error correct as much as possible. I will be donating the data to TechCrunch.Does it matter to entrepreneurs? I don’t know. It’s a starting point but I agree that personal chemistry with the partner at the venture fund is probably the most important element. Hopefully this at least serves as one more input in helping entrepreneurs to make informed decisions.

      1. JimHirshfield

        Thanks for the detail Chris. Much more comprehensive study than I thought. I like. 🙂

  5. Kasi Viswanathan Agilandam

    Better ranking would be taking the input from the entrepreneurs (1200*2 assuming minimum 2 founders) … that would reveal the entrepreneurs ranking of the firms. Well, which ever way we do it… there will be always criticisms and the standard comment “Lies, Damn Lies &Statistics”

    1. fredwilson

      input from the right entrepreneursthefunded is an embarassment

      1. gorilla44

        The “right” entrepreneurs from a VC’s perspective are probably dramatically different than from the entrepreneurs’ perspective.The best judge of VCs are LPs. I wish I could see their rankings.

      2. Donna Brewington White

        You don’t think TheFunded has any value?

        1. fredwilson

          none whatsover. it is garbage in garbage out

      3. William Mougayar

        If done right, an entrepreneur’s survey of VCs would be an interesting data point. 

    2. Chris Farmer

      I agree but it’s difficult data to get beyond anecdotes and I wanted to come up with a methodology that was more universal and only reliant on information in the public domain

  6. Harry DeMott

    “The objective of early stage VC investors is to get into the best deals in the first round and then to get other high quality firms to follow on in the next rounds.”I certainly agree with the first half of the statement.the second half, however, is really an outgrowth of the first half.Get great early stage deals – demonstrate that they are great by performing in the marketplace – and you will have your pick of excellent partners from there on out. what would be more interesting would be to see actual fund return data in terms of IRR’s – and then look at the composite sets to see the skewness of the returns. Sure having Google in there is great – but if it is Google and 99% donuts then it is tough to say that the firm really did a great job ( as opposed to getting one monster right) as opposed to a firm that had a ton of singles doubles triples, the occasional home run – but almost no strikeouts.You don’t see that data being circulated widely.

    1. fredwilson

      the quality of the follow on investors is a proxy for the quality of the investment

    2. Mark Essel

      “Google and 99% donuts” vs a ton of singles doubles triples is the difference between being lucky and being good. Investors have a pretty good memory so I’d favor the latter even if the former outperforms it.

  7. ShanaC

    Because of its similarity to PageRank, I am wondering if this sort of perspective can cause the VC equivalent of search engine spam, overinvestment in companies that follow certain kinds of trends. Would we miss the long tale out of nowhere successes with these kinds of follow the leader activity.

    1. DGentry

      I suspect the vastly different timescales make this unlikely.

      1. ShanaC

        We’ve had one bubble, why not another?

  8. sameer

    much needed indeed.do you think that there should be similar data/ranking for mentors too ?If yes what methodology for ranking do you suggest.

  9. reece

    awesome to see that four out of the top 10 are all firms started within the last 10 years.new blood ftw.

    1. Chris Farmer

      I was excited to see that as well and that several of the firms were not based in Silicon Valley (USV, First Round, GC, Spark). This is a great turn of events for the start-up ecosystem

      1. reece

        I agree.thanks for sharing.

    2. Steven Kane

      which ones? GC started more than 10 years ago. i only see 2 – Andreesen Horowitz and USV (and with all due respect to Fred, neither he nor Brad started as VCs at USV — both were well experienced before forming USV)

      1. reece

        according to this chart, USV, Andreesen, GC and Khosla.i’m not an expert, just going by the data i see here, but i’m always a fan of the upstart, under-dog.

        1. Steven Kane

          GC started 12 years ago fred wilson, marc andreesen and vinod khosla are “upstart under-dogs”?!?!?!?

          1. fredwilson

            i’ve said for a long time that the best funds are first time funds started by VCs who have been active for at least a decade. khosla, AH, and USV all fit that description

          2. reece

            re: GC – going my the data presented.re: upstarts – yes. a new firm is still a new firm, regardless of anyindividual’s track record. they can’t all be winners and it’s tough tofight the legacy of some of the established players.**

          3. Steven Kane

            “new firm”? of course. i was reacting to “upstart under-dog” which istill think is overstating the case, a lot.

  10. BradDorchinecz

    I haven’t studied his work closely enough, but I think this study seems to make an interesting statement on what makes a good VC. It seems to point to sourcing and access to the best startups as a key determining factor.The age old question we face when evaluating VCs is what has more influence on returns: a VC’s ability to source the best opportunities, or his ability to add value once he invests? I tend to think it’s a little of both. Judgment also is required to see the opportunity where others don’t.I do believe that the most important characteristic is to be the first investor in companies that take off and attract other top funds in later rounds.

    1. fredwilson

      i think the latter (adding value post investment) ultimately leads to theformer (getting into the best deals)

  11. Steven Kane

    investment firms should be strictly prohibited from creating novel research on their own performance;)

    1. fredwilson

      i would second that rule

    2. Chris Farmer

      I agree. I really struggled on publishing this data. I created it for my own purposes after I left Bessemer Venture Partners as a way of assessing the quality of firms that I was considering joining. I did the initial work two years ago when I wasn’t affiliated with any fund. However, many people were interested in the work and folks like Steve Blank, Dharmesh Shah, Tom Byers, Mike Arrington, etc. STRONGLY encouraged me to release it to the entrepreneurial community. To be candid, the General Catalyst partnership has tended to avoid the press and it was a difficult decision all around as we did not want to appear self serving. GC performs well and the work was a reason why I joined the firm. In retrospect, I should have released it when I first did the work and was publishing my other findings in Harvard Business Review and the Journal of Private Equity.

      1. Steven Kane

        Hi Chris. I guess I’m a little old fashioned — I think returns (and only returns) are the metric to gauge the success of a VC fund. Or any investment. Returns that can be measured and analyzed, using commonly and widely accepted and understood algorithms. Anything else is just, well, marketing.I mean, is there any investor who would rather have “buzz” than returns? Yet 100% of investors will always trade “buzz” for returns.You say “GC performs well” — well, we can’t know that to be true or false, because you don’t include GC return data in your analysis, nor in your commentary. Please do! And return data from any/all other VCs. I’m pretty certain Cambridge Associates data does not map to your lists, but I’m willing to be persuaded otherwise. 

  12. allon bloch

    The top two firms on this list have been mostly followers on great early stage investments – I don’t think that tells us anything about their current abilities to do early stage.   It just tells us they’re great followers.  Bessemer and Spark are surprisingly not on the list.  There are many ways to play this ranking game but this one seems flawed.  

  13. Patrick Dugan

    The gamification of start-up investing, what could possibly go wrong?

  14. Guillermo Ramos Venturatis.com

    I thought it was pretty straightforward to rank VC firms: net IRR to their LPs. Should other methologies are valid, I can disclose you all today that I rank number one in the VC world according to a well cooked criteria. If you want to rank number one as well, just drop me a call.

    1. fredwilson

      that’s a trailing indicator not a leading indicator

      1. Guillermo Ramos Venturatis.com

        you can build hundreds of “rankings” based on leading indicators, but calling any of them “Investor Rank” is confusing. An “Investor Rank” should be at least be made out of a combination 4-5  relevant leading indicators. I find Chris´ work very interesting and I congratulate him for it. Perhaps the problem is in the name.

  15. paramendra

    InvestorRank is quite a concept. Too often VC firms get ranked on lagging indicators instead. 

    1. Steven Kane

      and what exactly is wrong with that? i’d wager you are not an investor in VC funds — if you were, you wouldn’t necessarily love that returns take a long time to realize, but you would still only look at returns to decide if the investment is worthwhile

      1. paramendra

        You are right. I am not an investor in VC funds.

    2. BradDorchinecz

      There is persistence in private equity manager returns, so it does make sense to look at past returns as a piece of the overall evaluation:http://citeseerx.ist.psu.ed

  16. Charlie Crystle

    Would like to see AgeRank. 

  17. sigmaalgebra

    Okay, we are an LP; there is venture firm A; and we want to evaluate their performance.What we want to know is, broadly, how much money can firm A make for us over, say, the next 10 years.Let that amount of money be X. Then with really meager assumptions, X is a real valued random variable with expectation, finite expectation, and finite variance. In just what all this means, there are succinct, elegant, and profound details in, say,Jacques Neveu, ‘Mathematical Foundations of the Calculus of Probability’, Holden-Day, San Francisco.Note that I didn’t claim that all the details were easy reading.Well, we know that we can’t know the actual value of X now. So, we settle for estimating the expectation of X, that is, E[X]. For fine details on the expectation E[X], Neveu is excellent.Why the expectation? Because in the long run in practice, from the (weak or strong) law of large numbers, also in Neveu, that is what we will see. So, we settle for the expectation.Note: If we want to consider utility functions, then we let X be the ‘utility’ we want and maximize its expectation.So, how are we to estimate E[X]?Well, if we can get a ‘sample’, say, for some positive integer n, random variables Y(1), Y(2), …, Y(n), that are independent and have the same distribution as X (full details in Neveu), then we can take the averageZ = ( Y(1) + Y(2) + … + Y(n) ) / nand use real random variable Z as our estimator of E[X].Okay, just why would we want to use Z instead of some ‘ranking’, etc.? That is, just why is Z the ‘estimator’ we want?First, Z is ‘unbiased’, that is, E[Z] = E[X]. That is, in the long run, Z will give us the right answer instead of something off to one side.Second, among unbiased estimators, Z is the most accurate, that is, has the least variance. Details are in, say, the classic:Paul R. Halmos, “The Theory of Unbiased Estimation”, ‘Annals of Mathematical Statistics’, Volume 17, Number 1, pages 34-43, 1946.So, with Z we get the world’s only minimum variance, unbiased estimator.And that’s why we use Z and why we use averages of past data, and some assumptions, to estimate future returns.And that’s why we don’t take seriously ‘heuristics’ such as ad hoc ranks.And that’s why we don’t like results that come from just ‘algorithms’ without some solid reasons to take the calculations seriously.And that’s why we laugh at much of ‘data mining, machine learning, big data, heuristics, and artificial intelligence’.And that’s why we want some solid reasons for any data analysis we do and not just some intuitive suggestions.And that’s why math with theorems and proofs is so powerful.And, Mr. P. Thiel, that’s some of why we go to college.Thus endith the first lecture in Data Handling 101.

    1. Mark Essel

      1) Unbiased estimators depend on an underlying assumption about statistical distributions. If you’re model is off, your optimization is as well. Also there are a number of documented where cases L1 (absolute value) outperforms L2 (least squares) which much of the classical theory relies on.2) “And that’s why we laugh at much of ‘data mining, machine learning, big data, heuristics, and artificial intelligence’. “, I agree that there are common fallacies, but I think you’ll enjoy Jeff Jonas counter point and humorous title, “data beats math”.Mr. Jonas’ findings:”My gripe, if any, is that way too many people are chipping away at hard problems and making no material gains in decades (e.g., entity extraction and classification) … when what they actually need is more data. Not more of same data, by the way. No, they more likely need orthogonal data – data from a different sensor sharing some of the same domain, entities and features (e.g., name and driver’s license number).”

      1. sigmaalgebra

        Mark,For your:”Unbiased estimators depend on an underlying assumption about statistical distributions.”but on this I assumed merely that”random variables Y(1), Y(2), …, Y(n), … have the same distribution as X”and that is sufficient.Or, for algebraic details, which are easy enough, since, for i = 1, 2, …, n, each Y(i) has the same distribution as X, we have that E[Y(i)] = E[X]. Then from our estimatorZ = ( Y(1) + Y(2) + … + Y(n) ) / nusing that expectation is linear (Neveu), we have:E[Z] = E[( Y(1) + Y(2) + … + Y(n) ) / n]= ( E[Y(1)] + E[Y(2)] + … + E[Y(n)] ) / n= ( E[X] + E[X] + … + E[X] ) / n= ( n E[X] ) / n= E[X]so thatE[Z] = E[X]as desired.The minimum variance part from the Halmos paper is tricky to see but, still, makes only meager assumptions about the distribution of the data.For your:”Also there are a number of documented where cases L1 (absolute value) outperforms L2 (least squares) which much of the classical theory relies on.”Note: In L^1 and L^2, etc., the L abbreviates H. Lebesgue. He was an E. Borel student about 100 years ago and redid the integration in calculus. One result was ‘measure theory’ which via Kolmogorov became the foundations of ‘modern probability’ as in Neveu.L^1, L^2 are ‘norms’, essentially definitions of distance, on ‘vector spaces’. In the case of a random variable such as X, the L^1 norm is|| X ||_1 = E[ |X| ]On the left, the double vertical bars are common norm notation (in D. Knuth’s TeX they look better). On the right, the single vertical bars are just absolute value. Here I borrow from TeX and let _1 denote a subscript.The set of all X with finite L^1 forms a vector space. It turns out, as in Neveu, it’s ‘complete’ and, hence, a Banach space. ‘Complete’ is much as in the real numbers, that is, for a sequence that appears to converge, there is something there for it to converge to. The rationals are not complete because rationals can converge to, e.g., pi which is not a rational. One joke is, “Calculus is the elementary properties of the completeness property of the real number system.”For the L^2 norm of X, that is|| X ||_2 = E[ X^2 ]^(1/2)Similarly the set of all X with finite L^2 forms a vector space. It turns out (Neveu again), it’s ‘complete’ and, hence, a Banach space.But also important is, for this vector space, we can define the ‘inner product'(X,Y) = E[ XY ]Then the norm is just from the inner product:|| X ||_2 = (X,X)^(1/2)With this inner product, we have a Hilbert space, which is a good thing because we can do projections.Completeness for the random variables under the L^1 and L^2 norms is a bit amazing.It appears that Hilbert space is an abstraction of several important examples from the 19th and 20th centuries, especially Fourier theory, various orthogonal polynomials, spherical harmonics, various parts of differential equations, etc. So, get to do the derivations once, just from the axioms, and apply them many times.Apparently Hilbert space was a von Neumann idea; there’s a claim that once he had to explain what he meant to Hilbert!Back to our estimator Z: When we have n samples, let’s call our estimator Z(n). Then, as n grows large, we would like Z(n) to be a better estimator of E[X].Well, in the L^2 norm, our error squared is:|| Z(n) – E[X] ||_2^2 = E[ (Z(n) – E[X])^2 ]= E[ (Z(n) – E[Z(n)])^2 ]= Var( Z(n) )or the variance of Z(n). Such math looks MUCH better in TeX!But from the definition of Z(n) and our assumptions and our use of Y(1), Y(2), …, Y(n),Var( Z(n) ) = Var( ( Y(1) + Y(2) + … + Y(n) ) / n )= ( 1/n^2 ) Var( ( Y(1) + Y(2) + … + Y(n) ) )= ( 1/n^2 ) Var( Y(1) ) n= Var( Z(1) ) / nSo, as n grows, our error|| Z(n) – E[X] ||_2 = Std( Z(1) ) / n^(1/2)where Std is the standard deviation, that is, the square root of the variance.So for large n, our L^2 error converges to 0. So, our estimator Z(n) converges to E[X] in the L^2 norm. Then (Neveu) some subsequence of our sequence Z(n) must converge to E[X] with probability 1, that is, ‘almost surely’, and that’s the best we can hope for. In practice, or with more analysis or assumptions, Z(n) will converge to E[X} almost surely without the issue of subsequences.For practice, convergence in L^2 is good enough to take to the bank.If we were to work with L^1 instead, then things would be more difficult. Also we might end up estimating the median of X and, thus, have a biased estimator of the expectation of X.Also, we can know more than just that Z(n) converges to E[X}; in addition we can find ‘confidence’ intervals. From the sum of the Y(i) and the central limit theorem, for n over a few dozen, the distribution of Z(n) will be close to Gaussian. So, we can use a t-test to get a confidence interval. If we want to be still more careful, then there are some simple, somewhat ‘computer intensive’, ‘distribution-free’ (where we make no assumptions about the distribution of X) methods for getting a confidence interval on our estimate.On the value of ‘big data’, that’s questionable. I would recommend being careful about what we know about the data we seek and use. The criterion of “orthogonal” is not so good; likely he meant independent.One warning about ‘big data’ is in our:|| Z(n) – E[X] ||_2 = Std( Z(1) ) / n^(1/2)So if we multiply n by 100, then we divide the right side by the square root of 100, that is, 10 and, thus, get just one more significant digit of accuracy in our estimate. So, getting three significant digits is common; getting four is usually a strain; getting eight or more is nearly absurd.Commonly getting three significant digits is so easy that struggling with ‘big data’ to get, say, six significant digits is not worth the effort.One of the problems in practice, e.g., in ‘data mining’ and ‘machine learning’, is just ‘fitting models’. There can be some value there, but there are also some serious pitfalls: E.g., for some positive integer n and for numerical data (x(i), y(i)), for i = 1, 2, …, n, if the x(i) are distinct then we can just write down a polynomial p of degree n-1 so that, for all i, p(x(i)) = y(i). This is called Lagrange interpolation and has a richly deserved reputation for just absurd results. Fit? Yes. Useful? No! For something less absurd, can also use various splines, least-squares splines, and for functions of several variables. Still, getting ‘causality’, ‘meaning’, or good predictions from such efforts is not easy.My estimator of the coveted E[X] was from samples Y(i), but the thread assumed some additional data about first rounds and follow on rounds. So, with this additional data, might a more accurate estimate be possible? In principle, yes.One approach, the most powerful possible in the L^2 sense given sufficient data, is just old cross tabulation. That is, and an easy exercise starting with Neveu, if we have some data Y and want to estimate X, then we can take the conditional expectation of X given Y, that is, E[X|Y]. Then there is some function f so that E[X|Y} = f(Y), and this function f minimizes|| X – f(Y) ||_2Well, cross tabulation can be used to give an estimation of a discrete approximation to f.Another approach, that can work with less data, is to assume a ‘model’ with some parameters and then determine the parameters by fitting to the available real data. Pursuing this direction is beyond the scope of this post now!This would be, what, a Q&A session after the first lecture in Data Handling 101?Neveu? An elegant, powerful, crown jewel of civilization.

        1. Mark Essel

          Thank you for the in depth reply and review of subspace theory. Your careful description brought back fond memories of graduate school, and your passion for the subject is obvious.Let’s focus on a couple of assumptions that appear minor but could lead to surprise and unexpected results.”random variables Y(1), Y(2), …, Y(n), … have the same distribution as X”Do you believe Would you wager money that there is a distribution for X that bounds the behavior of the venture fund in question? Even if there is such a distribution and X is historically well behaved how do you account for singularities?———————————————————————-Also in this step:|| Z(n) – E[X] ||_2^2 = E[ (Z(n) – E[X])^2 ]= E[ (Z(n) – E[Z(n)])^2 ]= Var( Z(n) )you replace E[X] with sequence Z(n) in order to later describe how Z(n) converges to E[X] for large n. || Z(n) – E[X] ||_2 = Std( Z(1) ) / n^(1/2)That appears to be using the result of the derivation to prove the derivation.———————————————————————-With regards to the central limit theorem:”From the sum of the Y(i) and the central limit theorem, for n over a few dozen, the distribution of Z(n) will be close to Gaussian.” With this statement are you pegging E[X] as Gaussian? That can’t be correct. Once you assume E[X] is Gaussian you’re working on a different problem, and no longer trying to predict an individual venture fund’s returns. Are you stating Fred and the team at USV are properly characterized by a Gaussian random variable :)? Of course you could fit a Gaussian curve through their portfolio returns, but doing so doesn’t make company returns adhere to the distribution. Fred has described fund performance as a power law curve in the past here.———————————————————————-On big data: (I wish I could annotate your comment to split off better side discussions). The limitation you describe on significant digits is predicated on your model: ||Z(n) – E[X]||_2 = Std( Z(1) ) / n^(1/2). What Jeff described was by using simpler models and more data he achieved better results than by applying more complex models to far less data.Is there a large difference between model fitting (agreed there are good and poor techniques), and assumed distributions? I could represent unknown spaces of measured data with a placeholder distribution that describes local region behavior instead of generating presumed values (splines/interpolation/least squares).I didn’t have time to dig in and understand your mention of cross tabulation, but the relationship has me interested E[X|Y] = f(Y) where f minimizes || X – f(Y) ||_2. The best estimator for current fund performance should weight previous performance, I just don’t know how that would look (environment and macro economic trends impact performance as much as the selection of the investors).I’ve relied on Taylor series expansions a number of times in Physics while chasing dominant relationships. I feel comfortable pursuing dominant relationships with limited parameter spaces. Allow parameter number to grow and your fits keep on improving (always penalize additional degrees of freedom when scoring).ps: I republished this discussion on my blog today. I’ll be thinking about it all weekend :).

          1. sigmaalgebra

            “Do you believe there is a distribution for X that bounds the behavior of the venture fund in question?”When dig into the real foundations of probability as in Neveu from Kolmogorov, it is fully reasonable to say that the return the LP will see in 10 years from venture firm A is a real random variable; we’re calling it it X.Then necessarily X has a distribution, that is, for real x, F_X(x) = P(X <= x).  Again, from TeX, _X is a subscript.  So, F_X is the ‘cumulative distribution’ of X.In anything but a really bizarre world, X has an expectation, the expectation is finite, and the expectation of X^2 is also finite.That’s all that I was assuming about X, and it’s not much.To be, maybe, a little more clear, if we were to do this whole ‘experiment’ again, then we would likely see a different value for X. Roughly it is those different values that determine the distribution F_X.  Or, F_X ‘describes’ the ‘relative frequency’ of all the different values we ‘might’ observe for X. Such an explanation is usually regarded as convincing enough; only the very best students still upchuck and demand more!  In the end, starting with meager considerations, we are forced into this ‘setup’, even if somehow we don’t like it.The one random variable X doesn’t really “bound the behavior” of venture firm A if only because, with our assumptions, X is permitted to be unbounded.  That is, for each real number x no matter how large, we can have P(X > x) > 0.”Do you believe”?  The more difficult issue would be our assumptions about the ‘sample’ Y(1), Y(2), …, Y(n).  I would be very careful about that.  Basically we need to be so careful that ‘past results actually do help us estimate future performance’ or some such.For ‘singularities’, these are well accounted for in the distribution of X. The ‘black swan’ problem is different:  Well, the ‘efficient market hypothesis’ holds that the increments in the stock prices are independent (if independence does not hold, then some bright traders could make money, and the usual academic assumption is that this is impossible, along with the Greenwich Ferrari dealership, the $2 billion or so J. Simons supposedly paid himself one year, real estate prices in the Hamptons, etc.).Then might also assume that for some one stock, the increments all have the same distribution (might have to take logs of the stock prices to make this assumption easier to take).  Then by the central limit theorem, commonly applied to such ‘independent increment stochastic processes’, we have to conclude that the change in the stock price over the next 30 days will have to be approximately Gaussian.  Then we can look at the details of the tails of the Gaussian distribution and start to draw some conclusions about our ‘risk’.Then we can sign up some Nobel prize winners in economics, start a mutual fund, something about ‘capital management long term’ or some such, set up an office in Greenwich, argue that we can have really high leverage, maybe over 100:1, and still have low risk, and then, presto, find our offices with security guards out front and ourselves meeting in NY Fed offices with unhappy big bankers on the other side of the table, an unhappy Fed Chairman, various alarmist news headlines, etc. all because of something that happened in Asia that supposedly had too low probability to happen for centuries.Well, we were working in the long tails of the Gaussian distribution.  But we had a Gaussian distribution only approximately from convergence from the central limit theorem; we were not very careful about arguing just how fast the convergence had to be; and we had assumptions, likely not exactly true, about increments independent and identically distributed.So, net, a ‘black swan’ is just from something out in a tail of our distribution, especially when we assumed that the distribution was Gaussian before we had gotten good convergence from the central limit theorem that far out in the tails.In our problem for the LP and venture firm A, for our X we are not assuming that its distribution is Gaussian.  So, the distribution we are assuming for X may have ‘black swans’ far out in the tails.  If so, then so be it.  Our assumptions about the expectation of X and X^2 are still meager and not rendered unreasonable by the possibilities of black swans out in the tails.Generally in applications of math to real problems, the assumptions need to be considered carefully.  Then the assumptions become hypotheses of theorems, and the conclusions of those theorems lead us to our results.  If the theorems (and computing) are solid, then the assumptions are all that can go wrong.  So, we are on a ‘logical journey’; the assumptions take us part of the way, and the theorems (and computing) take us the rest of the way.  So, the theorems make PART of the trip more reliable.  We have to be careful all along, but we have to be especially careful about the assumptions.Or, we could take the approach of data mining, machine learning, and artificial intelligence:  F’get about the assumptions and maybe even the theorems and just hack code!You wrote:”Also in this step:|| Z(n) – E[X] ||_2^2 = E[ (Z(n) – E[X])^2 ]= E[ (Z(n) – E[Z(n)])^2 ]= Var( Z(n) )you replace E[X] with sequence Z(n) in order to later describe how Z(n) converges to E[X] for large n.”We had already shown that for each n, our estimator Z(n) is unbiased, that is, that E[Z(n)] = E[X].  So in the algebra I just replaced E[X] with E[Z(n)].On”Var( Z(n) ) = Var( ( Y(1) + Y(2) + …  + Y(n) ) / n )= ( 1/n^2 ) Var( ( Y(1) + Y(2) + …  + Y(n) ) )= ( 1/n^2 ) Var( Y(1) ) n= Var( Z(1) ) / nSo, as n grows, our error|| Z(n) – E[X] ||_2 = Std( Z(1) ) / n^(1/2)”should read”Var( Z(n) ) = Var( ( Y(1) + Y(2) + …  + Y(n) ) / n )= ( 1/n^2 ) Var( ( Y(1) + Y(2) + …  + Y(n) ) )= ( 1/n^2 ) Var( Y(1) ) n= Var( Y(1) ) / nSo, as n grows, our error|| Z(n) – E[X] ||_2 = Std( Y(1) ) / n^(1/2)”I regret the error and will punish my fingers severely!The correct result|| Z(n) – E[X] ||_2 = Std( Y(1) ) / n^(1/2)is fine for our purpose showing that our estimators Z(n) converge in L^2 to what we are estimating E[X].You asked:”With regards to the central limit theorem:’From the sum of the Y(i) and the central limit theorem, for n over a few dozen, the distribution of Z(n) will be close to Gaussian.’With this statement are you pegging E[X] as Gaussian?”No.  E[X] is just a number.For each positive integer n, Z(n) is our estimator of E[X] from having a sample (the Y(i)’s) of size n.For each n, our estimator Z(n) is a random variable; that is, with n fixed, if we do another trial, then we will likely get a different value for Z(n).So, what is the distribution of Z(n)?  Well, from the sum of the Y(i) and the central limit theorem, for n over a few dozen, the distribution of Z(n) will be close to Gaussian.On your”On big data:  (I wish I could annotate your comment to split off better side discussions) …”we are not talking about the same things.The result|| Z(n) – E[X] ||_2 = Std( Y(1) ) / n^(1/2)is, from the assumptions, just true.  It’s not a guess or a ‘model’ in the usual senses of a ‘model’.If there is a ‘model’ in my approach it is theZ = ( Y(1) + Y(2) + …  + Y(n) ) / nand using Z to estimate E[X].Here my ‘model’ is about the simplest possible:  I’m just taking an ‘average’.  The rest of my work was to show some of the good properties of such an average:  It’s an unbiased estimator, has minimum variance (for the data it uses), and as the sample size n increases the estimator converges in L^2 to the number to be estimated.In contrast, the ranking work was more complicated than just taking an average, was just intuitive (heuristic), and has no good, known properties and little or nothing to recommend it except intuition.For:”Is there a large difference between model fitting (agreed there are good and poor techniques), and assumed distributions?”It’s not appropriate to say that my work has “assumed distributions”:  In what I did, the crucial distribution was of X and the corresponding sample.  That the return X is a real random variable is justified; then X must have a distribution because every real random variable has a distribution.  About this distribution, I assumed only that E[X} and E{X^2} exist and are finite, and those assumptions are meager.Then with no more assumptions, I was able to get some solid results:  I got an estimator that has minimum variance and is unbiased and, for large sample sizes, converges in L^2 to the right answer.  The ranking work gave no such good results; we have no desirable properties for the results of the work.The careful work in statistical models, e.g.,C. Radhakrishna Rao, ‘Linear Statistical Inference and Its Applications:  Second Edition’, ISBN 0-471-70823-2, John Wiley and Sons, New York.usually starts with an assumption that it has a model and knows that it is correct and has some data and knows a lot of assumptions about that data.  The model is done except just for some ‘parameters’.  So the work is to estimate the parameters, the accuracy of the estimates, etc.  Given the assumptions, the math there is rock solid.  We get a lot of good information on the accuracy of the parameters and the predictions of the model.Sure, in practice, just how we know that we have a correct model and some of the other assumptions, e.g., ‘homoscadastic’, is a big, HUGE issue and where, again, the good students tend to upchuck.Or, consider planetary motion:  We know that the path is an ellipse.  So, we can take our data from observing the planet and estimate the parameters in our ellipse.  So, we KNEW we had an ellipse.  If we didn’t know we had an ellipse, then we would be back in the days of Ptolemy and his epicycles trying to fit the ‘big data’ he had.  We would be centuries from Galileo, Copernicus, Kepler, and, especially, Newton.  Lesson:  Knowing the form of the model is BIG DEAL, and not knowing it is a bummer.But what has happened in ‘machine learning, data mining, big data, parts of artificial intelligence’, all essentially in computer science, is to ignore assumptions and theorems, just guess at models, maybe use some of the calculations from, say, linear statistical inference, especially the normal equations or cases of least squares (or least sum of absolute values if want to use L^1, least worst case error if want to use L^infinity, i.e., Chebyshev approximation), write some code, call it an ‘algorithm’, and report the results.With only such work, the results have little to nothing to recommend them.  A medical analogy would be snake oil.For such ‘machine learning’, etc., there are some ways to do a little better.  The standard way is to take the data, divide it into two batches, get a model that fits the first batch well, and then see how well the model fits the second batch of data.Even to take this approach of two batches seriously we need some assumptions about the ‘batches’ as ‘samples’; without some such assumptions, the second batch of data is free to be the same or nearly the same as the first batch and, thus, fit without question.My point:  Stuff that comes out of a computer should have known, good properties.  E.g., minimum variance, unbiased, and converges in L^2.  The vanilla way to get such properties is to make some justified assumption, use those as hypotheses of theorems, that have solid proofs, and use the results of the theorems to do the calculations for the results.  Intuitive and heuristic techniques, ‘machine learning, data mining’, etc. are not providing us with computer output with good, known properties.What I am saying is old stuff:  E.g., suppose we want to build a radio antenna.  One way is to get some wire coat hangers, make a tangle, rotate it until we get the best results, and call it done.  Another way is to know the frequency band of the signal, the direction of the signal, its polarization, and do some basic antenna pattern calculations, and then build the antenna.  The second way gives us some known, good properties.As we try to move forward with computer applications, we will be better off with known, good properties than just guessing, intuition, and heuristics in essentially black boxes that we have to test just empirically.What I’m saying is standard operating procedure in applied math, engineering, and physical science and its applications.On E[X|Y] = f(Y) and f minimizes || X – f(Y) ||_2, the Y can be a big thing, some huge number, even uncountably infinite, of random variables.  The X is just a real random variable.  We can regard f(Y) as the solution to the ‘non-linear least squares’ problem in using Y to estimate X.The X might be in the future, and the Y may have data from the past, present, or whatever.  The result doesn’t involve time.The result follows from the Radon-Nikodym theorem in measure theory and has a famous proof by von Neumann.Since you have a background in physics, you might like measure theory since it has a more powerful version of the integral of calculus and also provides the foundations of probability as in Neveu.  For integration, get some good results on when an integral exists (darned tough to find when one doesn’t!), taking limits under the integral sign, differentiation under the integral sign, and, in multiple integrals, interchange of order of integration.  Also get a good result on when the Riemann integral exists:  If and only if the function is continuous everywhere except on a set of measure 0. Nice. 

          2. Mark Essel

            For shame that I typed E[X] as a random variable, please forgive such an error.I have read and enjoyed your response, particularly your talk of starting a fund ;). Let’s learn with data from large funds as well as individual company returns all while making wagers.As to the subtleties of the unbiased estimator, I respect such a weighting system more than the investor rank described above (which I still consider intriguing). Let the quality of the returns ultimately decide which is the better method.We need to take care when a distribution is hypothesized and used to predict the outcome for a single given event, in our case an IRR. While that event is a compilation of many other events, it is itself only a single draw from another random variable. Broad trends such as expected values and deviations are reasonable for large sample sizes which trail the present, but for single draws they provide little guidance as to precisely what a value will be. Such distributions are helpful for yielding confidence intervals.

          3. Mark Essel

            ps, I think I missed where you initially showed E[Z(n)] = E[x], I’ll reread your initial comment.

          4. FantasyDecathlon

            uncle

  18. Tom

    are vcs really investing anymore i find that the projects they want are not easily found

  19. Steve Poppe

    Being “original” is a wonderful trait.  My social media targeting theory: Posters Vs. Pasters gets it traction from this idea.  Love the first investor thought.