Data analysis and selection for statistical machine translation