A Data Science Lexicon for Non-Data Scientists

October 25, 2023

written by

Written by: Chris Bick, Data Architect

In today’s fast-paced world of Machine Learning and Artificial Intelligence (AI helped me fill in that phrase while writing this post…what a world!) it can be tough for those who aren’t familiar with terms like heteroscedasticity and Cronbach’s alpha coefficient to be able to interact with technical counterparts. As an architect who works with multiple data scientists, this is a feeling that comes up all too often.

One day, while I was complaining to our DS team that they never met a six-syllable word they didn’t like, they told me to learn to use Google and quit whining. From that helpful advice sprang the Data Science Lexicon for Non-Data Scientists. Enjoy!

Bayesian Prior – The likelihood of an event before the event took place. When first introduced as a concept, this sounded like away to bend statistics to whatever the Data Scientist thought they were going to see originally but was assured that is not the case. Alternatively…See: https://xkcd.com/1132/

P-Hacking – Manipulating experiments to achieve the desired statistical proof. I like to think of this as the lazy way to accomplish what a Bayesian Prior can do for you. Alternatively…See: https://xkcd.com/882/

Machine Learning – Catch-all term for statistical models and algorithms. Usually this or AI are the center square in the Data Science term Hollywood Squares, so you want to make sure you know this one. Alternatively…See: https://xkcd.com/1838/

Deep Learning – I’m going to sidestep the philosophy jokes here and get down to brass tacks. Yes, that’s right, it isn’t brass tax, it is brass tacks. That lesson is more important than knowing this term, so let’s move on. Alternatively…See: https://xkcd.com/1838/, but now imagine that pile of linear algebra is a lot bigger and no one really understands how or why it works.

Neural Network – To be frank, I have never heard anyone offer a definition of this that passes the smell test. I found an entire chapter in my father’s EE textbook dedicated to it (this was from the 1970's, mind you), yet no one can give a better definition than “it’s kind of like the human brain…there are nodes!” that I have found. Do better, DS community. Alternatively…See: https://xkcd.com/2173/

Artificial Intelligence – Skynet, but with WAAAAAY more Seinfeld, Friends, and The Office references mixed in. Plus, it can make a mighty fine chatbot.

The “GPT” in ChatGPT – Generative Pre-Trained Transformer. Rattle that bad boy off to your CDO and enjoy the subtle look of approval they give you!

Word Cloud – A fun visualization that increases font size of words more commonly used in a body of text. As it turns out, Data Scientists HATE these, so it can be fun to slip them into Slack or Teams chats with your DS team.

Disastrous Environmental Impact from AI and ML – Ummmm…Next definition!

Data Scientist – A person tasked with extracting meaningful insights from business data/a title that likely will enable you to ask for a healthy raise.

Algorithm – A term in use for thousands of years that Data Scientists have tried to co-opt as their own. Not happening on my watch. No definition.

Supervised Learning – I like to think of this definition like Wayne from Wayne’s World is saying it: “It’s like, a model, you know, but, we already know what we want it to say. So then we see if it says it!”

Unsupervised Learning Garth’s response: “But what if you, like, don’t know what you want it to say…is there a thing for that?”

Overfitting – The lazy overachiever’s approach to predictive modeling; essentially just playing connect the dots. Alternatively…See: https://xkcd.com/2048/

That’s all for now, if you enjoyed this wait until we skewer the Security team in the next one!

More Success Stories