Data Science for the Disenchanted

Data Science for the Disenchanted

By Ben Tyler Elliott

Just because you may have given up on your childhood dreams of winning a Fields Medal doesn’t mean you have to give up on a data-driven career. There’s a whole field and industry for folks like us: overeducated liberal arts majors who are passionate about finding answers that aren’t already on Wikipedia.

What is Data Science?

Data science is all about extracting meaning from data. It’s a field that combines statistics, computer science, and domain expertise to make sense of large data sets. Data scientists analyze data to find trends and patterns that can be used to make better decisions or irreverent blog posts.

Why Data Science?

Computers, as it turns out, are not going away. The world is becoming data-driven and gamified, which means there is an increasing amount of data being generated each day, by everyone, and everything, everywhere, all the time. In 2020, IBM produced a pieceHow to manage complexity and realize the value of Big Data. Smarter Business Review. (2020, May 29). Retrieved October 20, 2022, from https://www.ibm.com/blogs/services/2020/05/28/how-to-manage-complexity-and-realize-the-value-of-big-data/ citing a 2018 study that concluded that 2.5 quintillion bytes of data are created every day. To be completely reductive: That’s a lot of 0s and 1s.≈ 0.058 × the number of arrangements of a 3×3×3 Rubik’s cube (4.3×10^19). Which I suppose is an even less relatable comparison. So how about this. If you had one lemon for every zero in 2.5 quintillion—the number written out sans scientific notation—you would have nineteen lemons. As businesses become more reliant on data, the demand for data scientists will only increase.

What’s the Best Way to Become a Data Scientist?

If you’re interested in pursuing a career in data science, the best way to start might be to brush up on your math skills. I know, I know—you probably thought you were done with math after calculus. Stop whining. A strong foundation in math is super handy as you pursue a successful career in data science.

Once you’ve gotten your math skills up to par, you can begin learning programming languages like Python, or R, or Python. These languages will allow you to manipulate data sets and run statistical analyses.

Finally, you need to gain some domain expertise. This can be achieved most readily by working in a specific industry or sector OR by completing a masters or PhD program in data science.

What If I Don’t Want to Do Any of That?

Like many disenchanted millennials, disillusioned Gen-Xers, disabused Zoomers, and dispeptic Boomers, you may be feeling a little discouraged these days. Maybe you’re struggling to find your place in the workforce or pay off your student loans. Maybe it’s time for a career change. Or maybe you’ve just always wanted to learn how to

computer

like those hackers on NCIS:DUI.

And so you Google around, and you find all sorts of articles like this one that say that all you have to do is dedicate the next few years of your life to a curriculum, and then you’ll be ready to get started.

Soon, you start feeling like a fraud—like you’re not smart or talented enough for the field. Don’t worry—you’re not alone. This feeling is known as imposter syndrome and it’s actually a good thing. In fact, if you don’t have it, you’re probably not trying hard enough.

Fake It ‘Til You’re Fired

Guess what? Nobody knows what they’re doing, and everyone feels like an imposter. The sooner you can embrace this, the easier it’ll be for you to get started. So long as you don’t show up drunk, set anything on fire, or habitually over-promise and under-deliver, you’ll be fine.

So Where Do I Start?

If you’ve made it this far without closing the tab, you’ve already started! Here are some reasonable next steps:

Get Yourself a Fancy GitHub

Sooner or later, you’ll need a GitHub profile. Do yourself a favor and Google Jonathan Soma. He’s got about fifty different websites and walkthroughs re: getting started in Data Science, and his stuff is mostly geared around journalists who don’t already have a strong technical background. If you can Google your way to his resources, you know enough to get started. GitHub is invaluable when working in the field, but looking like you have your shit together on GitHub is just as important. And not just because it’s important for people to be able to see that it looks like you know what you’re doing.

See, getting a GitHub profile organized requires becoming familiar with common, language-agnostic best practices. If you can get a not-hideous GitHub profile up, you’re in good shape to start learning how to start sifting through data.

Choose a Miniature Project

And then do it. Start in Excel if you need to. Find a question that doesn’t have a readily available answer, but for which you’ve got access to a lot of free data.

If you can’t come up with an idea, use this one:

What’s the weather like where you were born? How have average annual temperatures changed there while you’ve been alive?

Start with something as small as that. Find some numbers, moosh them around, write it up, and put it online. It doesn’t matter if the entire article comprises a 2x3 table and a one paragraph writeup. It doesn’t even matter if your analysis is right. All that matters is looking like you know what you’re doing and talking through your process. So just do it.

And then do it again, and again, and again, forever.

But Nobody on NCIS:DUI Uses Excel

Look, data science is a rapidly growing field with immense potential for those who are willing to put in the work. And the work involves nothing more than learning how to investigate questions that you don’t know how to investigate. Programming languages like R and Python are probably going to end up in your toolbox at some point, since these this languages will allow you to manipulate data sets and run statistical analysis. But nobody needs to start there.

Experience the Domain

You need to gain some domain expertise, but this doesn’t mean you’ve gotta go get a masters. Almost everything you need can be achieved through executing personal projects and pushing your own boundaries. Make it a point to incorporate some new tool or technique in every project.

And put it online, coward. Most people are empathetic and like to see people put themselves out there. And everyone else is a total waste of your time.

Why worry about everyone liking you when you don’t even like everyone?

Data Science for the Disenchanted - October 20, 2022 - Ben Tyler Elliott