You've likely heard of ChatGPT. There are numerous articles and blogs online discussing its capabilities. If you have not, it is a chatbot from OpenAI that allows users to type in questions to which ChatGPT provides answers. The bot uses generative AI methods, including both supervised and reinforcement learning, to produce text-based responses. It is designed to be conversational, meaning it answers and often improves responses with additional interaction and dialogue. One task it performs is writing software programs.
Can it be used to write SAS® or Python code? Here at Magpie, we utilize both programming languages to build solutions that help healthcare organizations achieve their analytic objectives. Let's start with a SAS® example and see how ChatGPT performs:
This is probably one of our favorite datasets in the SASHELP library. It is a small 19 observation dataset with five variables: Name, Sex, Age, Height, and Weight. We use this dataset to build examples for clients, staff, and for general testing.
ChatGPT responded with the correct answer:
ChatGPT even provided an explanation for the code, which is useful for documenting code:
Next, we asked ChatGPT to:
We were curious to see how ChatGPT would answer the question. ChatGPT answered this prompt correctly ignoring a few minor required tweaks. See the code and explanation provided by the bot below:
The minor changes required were: 1) the first letter in the "Sex" and "Age" variables needed to be capitalized and 2) the local path to "sashelp.class.sas7bdat" needed to be added to the read_sas call.
It was interesting to see ChatGPT utilize the read_sas method, recognizing that a SAS® dataset may be read directly in Python utilizing Pandas. We often utilize this method in projects where we move data back and forth between SAS® and Python - which is a use case we are seeing more and more often.
Next, we decided to provide ChatGPT more of a challenge. Given the bot is trained on a massive amount of internet data, perhaps a made-up example would be more of a challenge. There are many examples on the internet that use the CLASS dataset that ChatGPT can learn. A new dataset may give the bot some trouble.
Here is the next prompt we provided ChatGPT:
Below is the initial SAS code that ChatGPT provided:
This code is not correct - perhaps we found the limits of ChatGPT. First, the PROC SORT BY statement applies the DESCENDING option to the variable that follows it, so "Age" and "descending" need to be swapped to avoid a SAS® error. Second, ChatGPT applied the BY statement and FIRST.variable approach in the last data step to age. This approach will keep the first observation for each age grouping, meaning if there are multiple patients 99 years old and multiple patients 65 years old, it will keep the first observation from each age. This will not find the oldest patients in the data.
Given, ChatGPT is conversational and purports to improve results with additional dialogue we resubmitted the prompt. One of the things we didn't like about the initial code suggested by ChatGPT, was the number of steps required. So, we asked a very simple follow up question:
ChatGPT responded:
Impressive - ChatGPT understood we wanted more concise code with less steps. It also recognized SQL via PROC SQL as an option within SAS®. The generated code utilized one step which we copied and pasted directly into SAS®, clicked run, and SAS® produced a dataset called oldest with 3 observations. The 3 oldest patients in our sample dataset were age 99.
Our initial hypothesis is that ChatGPT would produce better Python code than SAS® code. We were surprised to find that producing a Python solution to our sample use case appeared to be more of a challenge for ChatGPT, which produced several incorrect solutions. This is likely why Stack Overflow has temporarily banned answers generated by ChatGPT. Perhaps our programming jobs are safe for the near-term future.
It is important to remember that ChatGPT is designed to engage users in a dialogue, which means asking the bot follow-ups and rephrasing questions. One of the mistakes that ChatGPT made in both the SAS® and Python code was selecting a group by approach and not recognizing that there may be two or more patients who have the oldest age. The key to solving this use case is finding the maximum age and then finding all the patients with that same age. With this key in mind, we updated our ChatGPT prompt, and it produced a working solution:
So, can we use ChatGPT to write SAS® or Python code? The short answer is yes. However, coders need to be prepared to validate and experiment, especially as the use cases get more and more specific and esoteric. ChatGPT is fallible - so users need to be weary of the bot's responses. Despite this issue, we plan to experiment more with ChatGPT in our daily work. Our hypothesis is it may provide better potential coding prompts and solutions than simply searching Google, or minimally move us toward a working solution quicker.
Finally, when we need a break or a laugh from a long day of coding, we can have fun using ChatGPT by asking it to perform silly tasks such as:
While ChatGPT may not write quite as well as Shakespeare, it produced a few lines that were good for a quiet laugh before we got back to work for the day.