We start with some parameters describing our wave function, (e.g. orbital coefficients and expansion coefficients ).
And we need some kind of a rule to make an n-electron wave function from λ at a geometry R .
We can compute the energy of this wave function in the usual way
We may formally view this as a function of λ (not any more a bilinear function) and rewrite it according to
This way all geometry dependence is in the Hamiltonian - the actual geometry dependence that comes from the motion of the nuclear charges and the geometry dependence that arises from the wave function expansion (e.g. motion of atom centered basis sets).
The above expression gives you the energy for any kind of expansion λ at any geometry R. Now in practice we will have some kind of a rule (or mapping) on how to create the wave function (e.g. MCSCF) at every geometry.
Now, the geometry dependent energy with respect to this rule is given according to:
Next we can compute the derivative of this expression with respect to a coordinate x in R (derivatives will be denoted with a superscript (x) or (λ)). This requires the use of the chain rule 
or with the definition from above
The first term evaluates the change in the Hamiltonian with the reference wave function. The second term is the gradient of the energy with respect to the expansion parameters multiplied with the change in expansion parameters.
If λ was determined in a completely variational way, then
The corresponding second term cancels and we arrive at the Hellmann-Feynman theorem.
The significance is that with a completely variational method, we do not have to worry about the response of the wave function to the geometrical perturbation and the formulas simplify. In real life MCSCF (and the special case SCF) is the expansion where this holds. For such a wave function the gradient comes directly from combining the density matrix (from last post) with the atomic derivative integrals. CI gradients are more complex because the orbitals are not optimal with respect to the CI expansion coefficients. And in coupled cluster not even the amplitudes are variational.
 R. Shepard. The Analytic Gradient Method for Configuration Interaction Wave Functions. in Modern electronic structure theory. 1995 ed. D.R. Yarkony
 Note that there are three types of coordinates: wave function coordinates λ, coordinates of the Hamiltonian R (I am thinking here mostly about the nuclear geometry but it could be anything else as well) and implicit electronic coordinates that I am not writing out.
- The λ in the wikipedia article corresponds to my R actually (I hope it is still understandable).
- A slightly tricky thing in this context is that in general the same λ can give a different wave function at a different geometry. This happens because atom centered basis sets move around in space with the atoms.
 The notation is kind of difficult with all these different types of coordinates (maybe not written in the optimal way here) but in principle this is a straight forward application of the chain rule. You could also do it with Taylor expansions truncated to first order - but chain rule is prettier ...