Developed a comprehensive tutorial demonstrating advanced text mining and data analysis techniques for biomedical literature using R programming. The project involved downloading cancer-related research abstracts from NCBI's PubMed database and implementing systematic analysis workflows using PubMed.MineR package
• Designed and executed automated text processing pipelines for analyzing medical abstracts from multiple research papers
• Implemented gene atomization algorithms to identify and extract gene symbols (ERAS, HR, IRF5, KRAS, TP53) and their associated biological functions
• Developed term frequency analysis methods to quantify the occurrence of medical terminology across research abstracts
• Created contextual search functions to extract gene-specific sentences and disease-related content from scientific literature
• Integrated PubTator API functionality to retrieve structured information about genes, diseases, mutations, chemicals, and species from PMID numbers
• Demonstrated data manipulation techniques, including combining multiple abstract datasets and selective content removal