Fred’s Medical Grand Rounds

Thanks for attending my grand rounds!

Click here to download a Powerpoint of Presentation

Neural Network Playground

Click Here to open the neural network playground.

This is how I demonstrated the benefits of additional layers of a neural network to predict a hypothetical set of hypotensive patients based on EF and volume of saline received.The model allows you to choose a test dataset, the inputs into the model, the number of hidden layers of neurons, as well as the number of neurons per layer. It also has several other adjustable parameters which we didn’t discuss –

Learning Rate: This reflects how quickly the model will converge on a solution, or how big of a change will happen to the model each time the current error is calculated. However, when the learning rate is too large, instead of converging on a solution, the algorithm will overreact and may generate increasingly incorrect models.

Activation: This refers to the ‘threshold’ function that is applied to the weighted inputs to each neuron to generate the output. These are just different ways to model the ‘all-or-none’ conduction of neurons, where the input must reach a certain threshold before a signal is generated.Regularization: Given enough features, as you continue to train your neural network model, eventually you expect to gain near perfect accuracy for your training dataset. However, this is in part because the model can begin to recognize noise in the data instead of features that help predict the outcome you’re interested in. For example, suppose you train a model to recognize dysmorphic faces to identify children with a genetic disorder. At first the model may recognize important features common to many patients with your genetic disorder of interest, such as microcephaly or micrognathia. However, as you continue to train your model, it will begin to recognize features associated with individual images in the training data set that are unrelated to the disorder – perhaps the presence of an unrelated mole, or the lighting in one individual picture.

Regularization provides an incentive for your model to use the lowest weights possible for every connection, so that only true features that are consistent across multiple images will be recognized by your model.

Ratio of Training to Test Data: It is traditional to set aside a portion of the data used to train your model as ‘test data’, so that you can assess the validity of your model on data that it has never been exposed to in training. Generally 10-20% of the data is used for validation, depending on the size of your dataset.

Noise: The test data listed on this neural network playground follow an exact pattern, but by increasing the noise parameter you can create a more realistic ‘messy’ dataset, where it is impossible to precisely classify every element of the dataset.

Batch Size: Refers to how many of the data points are used in each round of training the model.

Using transfer learning to classify images

First, download and install python version 3.6 for Windows / Mac

You can verify your installation is working by opening up the command line – on windows you can do this by pressing ⊞ Win + R to bring up the run window and then typing cmd

You can test to make sure your python installation is working by running python version 3.6 from your command line with the command

py -3.6 --version

Install some needed modules with the commands

py -3.6 -m pip --install numpy
py -3.6 -m pip --install tensorflow
py -3.6 -m pip --install tensorflow-hub

Download this code and test data

First you need to train the model on the white blood cell images. Unzip the downloaded folder above, press shift + right click inside the folder, and select “Open Command Window Here”.

Inside the command window, type:

py -3.6 retrainBMP.py --image_dir=WBCs

Your model should now be training! To test the results on a new image, type:

py -3.6 label_image.py --image=WBCs/neutrophil/1.bmp

You will get a list of percentages associated with each image class!

If you’re interested in learning more, this code was largely based off of this tutorial. Read more to discover how to fine tune your model and training data so you can get a model that’s over 90% accurate!

My database of white blood cell images is modified from the LISC database. Any research published using this database should include the following citation:
Rezatofighi, S.H., Soltanian-Zadeh, H.: Automatic recognition of five types of white blood cells in peripheral blood. Computerized Medical Imaging and Graphics 35(4) (2011) 333–343.

Using transfer learning to classify images (Mac)

First, download and install python version 3.6 for Windows / Mac

You can verify your installation is working by opening up python in terminal (Go to spotlight and type ‘terminal’). Then run the following:

python3.6 --version

Navigate to the Applications folder, then to the ‘Python 3.6’ folder, and run “Install Certificates.command”.

Install some needed modules with the commands

python3.6 -m pip install numpy
python3.6 -m pip install tensorflow
python3.6 -m pip install tensorflow-hub

Download this code and test data

First you need to train the model on the white blood cell images. Unzip the downloaded folder above, then navigate to the folder using the terminal. If the folder is inside your ‘Downloads’ folder, you can type the following to navigate to the folder.


cd downloads/WBCs

Inside the terminal, type:

python3.6 retrainBMP.py --image_dir=WBCs

Your model should now be training! To test the results on a new image, type:

python3.6 label_image.py --image=WBCs/neutrophil/1.bmp

You will get a list of percentages associated with each image class!

If you’re interested in learning more, this code was largely based off of this tutorial. Read more to discover how to fine tune your model and training data so you can get a model that’s over 90% accurate!

My database of white blood cell images is modified from the LISC database. Any research published using this database should include the following citation:
Rezatofighi, S.H., Soltanian-Zadeh, H.: Automatic recognition of five types of white blood cells in peripheral blood. Computerized Medical Imaging and Graphics 35(4) (2011) 333–343.

Modeling Breast Cancer Survival with SEER Data

Download the Weka 3 Machine Learning Tool

Run the Weka 3 tool after installing, and choose “Explorer”

In the Preprocess tab, open a the file you wish to analyze. You can see how Weka can analyze comma separated value files with this test data file (must extract before selecting with Weka). In this training set of data, you have two input values, and a label. If both inputs are positive, or both are negative, the label is 1. Otherwise, the label is 0. This is hard to model with traditional regression, but easy with neural networks.

In the preprocess tab, choose “Open file…”, and select test.csv. Make sure to click “Invoke options dialog”.

In the options dialog, make column 3 a nominal attribute (in this case, indicating that the label is binary), and make column 1 and 2 numeric attributes. Since logistic regression is used for binary classification, we have to make column 3 binary in order to use a simple logistic regression.

In the Classify Tab, hit the “Choose” button to select your model, and navigate to the Simple Logistic model.

The words “SimpleLogistic” should now show up next to the Choose button. If you click on the word “SimpleLogistic” you can edit parameters of the model, but let’s keep the defaults for now. Under Test options, choose “Use training set”, and select the column “Label” from the dropdown menu to indicate we want to predict the data in the “Label” column of the test.csv file. Hit Start to generate a prediction.

We can see that the logistic regression classified 50% of the test data correctly. We can visualize this data better by right clicking on our trained model in the bottom left, and selecting Visualize classifier errors.

Now plot the Train1 column on the x-axis, and Train2 column on the y-axis. We can see the square boxes indicating that all the data when X is positive is misclassified.

Can we do better with a neural network? Select Multilayer Perceptron (another name for a neural network) from the Choose menu.

We can change the parameters for the model by clicking the “MultilayerPerceptron”, but lets leave these as default for now. Again, hit the start button to train the model.

We can see from the summary that the classification was much more successful! Again, lets visualize the results.

We can see that nearly all values are correctly classified.

You can request access to the SEER data here. You can choose to download the SEER data as text files. If you open the text file in excel, you can convert it to a CSV. First, you need to select “Text to columns”.

Choose fixed width columns.

Finally, you have to go through the tedious task of choosing column widths that match up with the data file description included with the SEER data.

After setting the column widths, hit finish, and save the file as a .csv file. You can now load it with Weka to analyze! You may need to edit the columns to represent the variables you are interested in. For example, survival in SEER is listed in months, so if you want to generate a logistic regression on 5 year survival, you need to create a new column that lists a 1 if survival >= 60 months, and a 0 otherwise.

Additional Resources

Machine Learning Crash Course
Some advanced tutorials for those familiar with python.
If you’re interested in learning to program in python.

Interns are here!

Hi everyone,

We’ve had an amazing whirlwind of events, and are excited to see all the new smiling faces! The start of the new interns marks the end of an era, and we’re so happy with all the fellowship and job opportunities our graduates have taken. We’re equally happy to know many of you will be staying at Yale (can’t wait for you all to give noon conference!)

Graduation – credit Mia Djulbegovic

That being said, we have had a wonderful time meeting all the new interns, from our work together at the intern bootcamp, to the fun we had at the 6th floor BBQ.

BBQ – credit Steffne Kunnirickal

Some of our fearless PGY2s saved you from a shark (orca?) while riding atop a swan.

Dolphins – credit Leila Haghighat

All of our residents are so excited to work with you! The Donaldson team even made some interesting artwork. Nonetheless, the time they must have spent embodies their enthusiasm to get to work with you!

Donaldson Workroom – credit Chris Sciria

We’re looking forward to seeing you all at report and on rounds. Let us know how things are going, whether good or bad!

Schedules

Hi everyone! Welcome to the inaugural post on the Yale Medicine Chief’s Blog! We’re certain that many of you have questions about the schedules – how is it generated, how do we keep things fair, etc… and we hope to answer all that and more!

Requests:

Certain vacation months were quite popular, and almost no one wanted vacation at the beginning of the year. The Klatskin service was very popular among our residents, whereas interns preferred Duffy and Peters. 60% of interns and 80% of residents had personal requests related to m(p)aternity leave, conferences, weddings  – clearly life happens outside of residency!

Schedule Generation:

To ensure equality, we proceeded with a stepwise process to generate the schedule. First, clinics were assigned as per resident color blocks (for PGY-2s and 3s), and per intern clinic requests (for traditional interns). As many of you know, our traditional internal medicine program is on a 6+2 block system; however for many years this has not applied to our preliminary and anesthesia interns. For the upcoming year, we also made it a priority to ensure these interns have a lighter rotation every 6 weeks, to ensure quality of life is maintained, and to help social groups form among those on each clinic ‘color’ block.

Following the clinic scheduling, we took into account the special requests that many residents have, including weddings and necessary absences, giving high priority to these important events. To ensure fairness to those who did not request special events – we kept a ‘fairness’ tally, so that those who were given time for important life events might choose their vacations or electives after those who had no special requests.

Vacations were then distributed for the first 6 months and second 6 months of the year. Each vacation request fulfilled reduces the number of residents available for backup or jeopardy. We thus had to limit vacation requests for a specific block to ensure we maintained a robust backup and jeopardy pool. Those who did not get their first choice in the first 6 months were given a higher priority for the second six months using our fairness tally.

Electives were distributed based on everyone’s preferences, as well as ensuring that most graduating seniors had an opportunity to work on most rotations during their time here. It is a perennial concern that residents graduate without rotating on the Goodyer cardiology floor service – unfortunately there are just not enough spots on this two resident service to ensure every resident gets this experience. However, all residents will have CCU time during their first and second years (and some during their third years), to ensure a robust cardiology experience. Additionally, we were able to ensure rising seniors who put this rotation as their first choice and had never done Goodyer were able to have the experience. Similarly, the popularity of Klatskin, our liver service, outweighs our supply. We have a longstanding relationship with the Bridgeport program which helps staff the service; and although this slightly decreases access to Klatskin, we would not be able to staff our services and supply residents with the current amount of elective time available. Again, we were able to ensure rising seniors who ranked this rotation first and had never rotated on the service had access. We were able to ensure 86% of residents got their first choice, and everyone got one of their top two choices. 70% of residents will get to do both their first and second choices!

Finally, we filled in the remaining rotations, with an attempt to balance nights between residents, ensure ICU time was similar and that PGY1s and 2s got experience in both the MICU and CCU, and minimize transition issues where coverage needs to be pulled for a resident starting a daytime rotation after nights.

We understand the schedule isn’t perfect – our 82 upper year residents cover 53 rotations for 26 2 week blocks; not accounting for clinic blocks or equality,  there would be over 10^49 possible ways to arrange these blocks – and that’s just the residents. However, we hope we’ve been able to satisfy many of your requests, provide you with a balanced schedule, and ensure you get a well-rounded experience in internal medicine.