R Programming-Bioinformatics Class (6930) Fall 2018

Class Syllabus: Course Catalog

Title of the course:

Essentials of Next Generation Sequencing (NGS) & Microarray analysis.

Number of credits which can be allocated for this course:

4 Credits.

Course Semester:

This course can be registered in summer, fall or spring semesters.

Instructor Name:

Mr. Shrikant. Pawar (Linkedin)

Office: 586 Petit Science Center.

Georgia State University.

Phone: 404-431-0213. Email: spawar2@student.gsu.edu

Office Hours: Monday-Friday, 10am-5pm

Class Time:

2 hours and 30 minutes.

Pre-requisites for this course:

This course is especially focused on graduate students with biology major who are actively involved in research pertaining to gene expression analysis. This course can also be applied for under-graduate students and non-biology major graduate students wanting to learn applications of R programming in biology. Assuming biology students have no programming background, there are no specific programming or non-programming course pre-requisites for registering this course. Although a background knowledge of biological processes and some programming techniques will be useful.

Purpose of course:

The need for including this course in current academic setting can be answered by following questions:

1) What is Bioinformatics?

Bioinformatics is a recent field of science designs software tools for research in the life sciences. Today, the quantity of biological data accumulated by laboratories is daunting. As a result, the data can no longer be dealt with ‘manually’ and bioinformatics has become an essential commodity. In the advent of big data, the requirement for bioinformatics training as an integral part in life science research is becoming increasingly apparent. For the first time, an international consortium of bioinformatics educators and trainers across the globe have come together to transcend institutional and international boundaries to share bioinformatics training expertise, experience, and resources. The Global Organization for Bioinformatics Learning, Education & Training (GOBLET), which includes The Genome Analysis Centre (TGAC), is focusing on developing a training portal into a global,

community-centered resource and supporting activities to aid the next generation of bio-informaticians.

2) Is there a need of Bioinformatics?

According to the human genome project, number of genes in each cell is approximately 20,000. This large amount of data can be studied to identify each gene and its specific function in different organisms. Storing, accessing and analyzing such data needs special skills in computers. With advancement of sequencing techniques, thousands of nucleotides are being sequenced from different organisms and deposited into different databases on internet. Interpreting biological data, predicting 3-D structures of biomolecules, constructing evolutionary trees that help us to find ancestry of different organisms are specific functions of using bioinformatics techniques.

The first phases of mapping the NGS, microarray normalization, the genotyping, all of these procedures can be done with different bioinformatics tools. The second phase of biological interpretation is crux of any research. If a PI comes to a graduate student with raw microarrays and asks for normalized data, it calls for an expert service, a skilled service. Usually the plan is to use some licensed software’s for analyzing these data sets, which have many disadvantages. Since these software’s are generic they can provide specific pipeline for analysis. Any modifications are prohibited. A skill in R programming can overcome all the prohibitions to perform a custom pipeline in analysis of such datasets. Furthers GSU also has to pay for commercial software licenses for analysis which can be completely overcome if the student can utilize these open source tools to get similar or even better analysis results. It’s a research manifesto, where if the student contributes to experimental design, if he understands the biological question, if his analysis provides

insights beyond a list of differentially expressed genes or a heat map, then he’s doing biology.

3) Tools available for Bioinformatics analysis?

The beauty of this field relies on the open source nature for most of the bioinformatics analysis tools. Most of the tools which we will be using are free of charge from Bioconductor platform. There are several packages which can be downloaded from this platform, further the platform keeps updating with newer and better tools as they are developed.

4) Bioinformatics tools at Georgia State University?

Since we don’t have a separate bioinformatics department, students and their PI’s rely upon professional bio-informaticians or try collaborating with bioinformatics departments from other universities like Georgia Tech or Emory for their data analysis. This ends up by giving credits to collaborators for the published grants and articles, instead most of the credit for such data analysis can go into student’s pockets and won’t have a need to share their research projects and credits with anyone else. By the end of this course student should become self-sufficient in various data analysis techniques pertaining to NGS and microarrays. Moreover, since very high impact factor journals like Nature Genetics or Science insists researchers to show bioinformatics analysis for backing up their wet lab data this course will be crucial for student’s research. Nature Genetics now has an “Analysis” paper track incorporating importance of these techniques.

GSU has several tools for bioinformatics analysis. GSU Students do have free access to several server spaces (SNOWBALL) for analyzing their NGS data. Further GSU also provides free of charge High Performance

Computing (HPC) Clusters (OCTAN, CARINA, ORION) for speeding up such analysis techniques.

5) Future prospects of Bioinformatics?

Bioinformaticists would be called upon to answer a question about data. Previously their role was to run an algorithm on a database that provided that answer. But the subject has evolved from a service, like histology, to its own research arena. Bioinformaticists are now the motor of the innovation. Bioinformaticist positions are decentralized, and located within different therapeutic areas. More often than not, hiring decisions are made based on the immediate needs of the team, especially given their interdisciplinary nature. According to one of the giants in Bioinformatics companies, Genentech, “There are 100% more job opportunities opening up in bioinformatics than ever before,” much of which is driven by an increase in venture capital investment.

Proposed syllabus (Subject to content additions on approval):

Sr.No Topics to cover

1. Introduction to NGS, Microarrays, Databases.

2. Introduction to R environment and Bioconductor packages.

3. Unix/Linux operating system basics, Installing R and packages. Getting familiar with R commands.

4. Introduction on operating GSU server and HPC cluster

5. Start-up R exercise for sample NGS data.

6. Follow-up on Start-up R exercise for sample NGS data.

7. Start-up R exercise for sample Microarray data.

8. Follow-up on Start-up R exercise for sample Microarray data.

9. Exam 1.

10. Different algorithms used in NGS and Microarray analysis.

11. Different algorithms used in analysis, Application of Bio-conductor analysis packages-NGS.

12. Gene expression analysis packages-NGS, Data visualization: Heatmaps, Pie-charts, Venn diagrams.

13. Application of Bio-conductor analysis packages-NGS, Data visualization: Heatmaps, Pie-charts, Venn diagrams.

14. Different algorithms used in analysis, Application of Bio-conductor analysis packages-Microarray.

15. Gene expression analysis packages-Microarray, Data visualization: Heatmaps, Pie-charts, Venn diagrams.

16. Application of Bio-conductor analysis packages-Microarray, Data visualization: Heatmaps, Pie-charts, Venn diagrams.

17. Exam 2

18. Project Presentation: Research Project:
Requirements
This will be an individual project.

Topics
o Each student must propose their topic. The topic must be related to NGS or Microarrays.
o This can be either a research project (which may lead to a research paper), a survey (which may lead to a survey paper), or a programming project (which may lead to a master project).

A proposal must be emailed to me with the following information by the deadline indicated.
o Title
o Abstract
o Introduction
o Materials and Methods
o Results
o Discussion
o Conclusion
The report should be at least 10 page long, single-spaced, double column, Times new roman, 12 point font written in an OVERLEAF DOCUMENT! Generate a pdf and email it to me to get full points. Each team will present their project at the end of the semester.

Deadlines
Final report due date: one day before Exam 2.

Withdrawals: The last day of regular withdrawal.

Course Requirements: Students should attend all classes, regularly complete all outside reading, project and other assignments.

Course Grades: Homework 20% (lowest will be dropped), Exam1 – 20%, Final Project 30%, final 30%.

Other Policy:

Make-up’s or missed deadlines must be arranged prior, and generally are not allowed.

Any material submitted for the grade should be the student’s own work.

Collaboration is allowed prior to preparation of actual material that will be submitted for the grade.

Class Notes: Overleaf