Some of the project topics are too small to be a 2-person project.
If you want to work on a specific one please let me know.

=================================
Possible projects for Sta 635:
=================================

Added Feb. 2008.

(1) Use empirical likelihood to test the hypothesis about mean residual life time.
[see my notes: empirical likelihood and mean residual time]

Write R function to test the mean residual time at a given age is equal to a given
year, based on the code el.cen.EM2(), which will search min over a value
automatically. 

Proof of the lemma that min over a parameter turns a chi
square df=2 statistics into a chi square df=1 statistic

Some examples or simulations.


Added 2005.

(0) more efficient algorithm (than grid search)
    to find confidence region (for dim >=2) from likelihood ratio value.

(0.5) Sample size determination for the logrank test. The influence of
censoring etc. Survey of Softwares. Demonstration of free package.
See http://www.biostat.wisc.edu/~kosorok/renyi.html 
Ref: Sample Size Calculations in Clinical Research
by Shein-Chung Chow, Jun Shao and Hansheng Wang 

http://www.childrens-mercy.org/stats/weblog2004/survival.asp

http://www.jhsph.edu/Research/Centers/CCT/javamarc/Shih/shihsizeuserguide.htm

(1) A special type of regression model: y_i = a + bx_i + U e
where Ui = exp( r x_i) , e is extreme value.

How to parametricly estimate the a, b, r  ? (MLE?)
Is there a two-step least squares procedure?
What about censored data?

(2) Bootstrap applications in Survival analysis. There are many....
(distribution of logrank test for small sample size, etc)

(3) Piecewise exponential: all the related stuff, carried to the limit....
    including some R coding...   * interval censored data MLE and how
    do estimator change as the number of pieces grow?
     * How to do a proc lifereg with error distribution as piecewise
    exponential? How do things change as number of pieces grow? etc.
with fixed cutting points (not change with subjects)?

(4) Stability/robustness of the Kaplan-Meier/Nelson-Aalen estimator. 
under error observation/perturbation; (observe value with error)
under censoring indicator error, etc.

(5) Implement Lin and Wei (1992) paper (of 3 pages) on the Buckley-James
estimator and compare it with existing (EL) methods.

(6) Cox model with a surviving fraction: model assumptions and 
implementation. Compare with regular Cox model.

=====================================================================
(0). Estimation (Kaplan-Meier, Nelson-Aalen) with late entry/early withdraw
data. The variance estimator. Confidence intervals--comparison of 
several methods.

(1). One sample log-rank and other rank tests. Compare one
sample to the general population (census) data. (From survival
package ratetables) Compare of several methods (accuracy of p-value).

Also may include the covariates of race, age and sex in the test 
(adjust for covariate).

Survival package in R (Splus) and its instructions should be helpful.


(2). How to chose the weight function in the weighted log-rank type
tests to maximize power? (more theoretical) 
Reference: Gill's book (censoring and stochastic integrals)

(3). Similar to (2) but to demonstrate that the power of log-rank test
can suffer for non-proportional hazards $H_A$. And the possible fix 
(apply the test only for certain time interval? or adhoc? 
or more systematically?).
%How to handle late entry/early withdraw data?

(3.5) (added 2006) combine two tests: a logrank and a test
for cross hazards. Evaluate the (power) property.

(4). Testing hypothesis for equality of and confidence interval
for difference of two medians.
(use R function discemlik() or emplik.Hs.test())  
See also (7), should work in 
close tie with (7).  Better or worse than log-rank test?
(If you want to work on this, I have some more info)

(5). Residuals in the parametric regression (lifereg) and semi-parametric
regression model (phreg). Types of residuals. How do they behave under
correct and wrong models, (mis-specification, ...).  Simulation. Plots. 


(6). Frailty Model. An introduction and example. Use the exponential
regression model with a random effects term to explain.

Some useful references: Book by Therneau and Grambsch, Tech Report by
Therneau, Grambsch and Pankratz. 

(7) Confidence Interval Estimation of median with censored data via empirical
likelihood el.cen.EM().  Variations: Use a smoothed
indicator function. Either a linear smoother or a cubic smoother.
Cubic: G(t) = t - t^3/15 + 2 \sqrt 5/3 ,   for |t| < \sqrt 5
and zero or one otherwise.
Linear: G(t) = t + 0.5 ,   for  |t| < 0.5
and zero or one otherwise.
Compare to the performance with plain indicator function.


(8) Efficiency comparison between a Cox model and exponential/weibull
model when all model are valid. (and small departures).
Under various model specifications: beta value, censoring percentage etc.
(see my Notes). (I will do this in class, so you cannot do it again :-( ).

(9) Compare the performance of two versions of logrank test: one from the
proc lifetest, another from proc phreg, score test.
Try them on continuous and discrete distributions. (small project)
Which has better power for small samples? or more accurate P values?

(10) Implement a version of log-rank test for a non-parametric 
sample versus a Weibull sample. Use simulations to evaluate several
variations of the test.

