At Alight Analytics we use Python for our ETL system as well as production models. Much of the reason for choosing Python was happenstance but looking back, it was a good choice. When choosing a language there are many things to consider; I’ll discuss a few reasons why Python might be a good choice for you.
- There is a very active, accepting, and helpful Python development community. Python libraries are accessible, extensive, and growing. The packages developed and shared through open source licensing decrease the time it takes to deploy software and the amount of code that needs to be maintained in house. More can be found about this community here: https://www.python.org/community/. There are also Meet Up groups in many communities dedicated to spreading knowledge and ideas about Python development.
- A large part of the Python development community are scientists. Python is the choice language for many Data Scientists in particular. At Alight, we do a lot of modeling across large and sometimes dirty datasets so this is important.
- Because of the use by Data Scientists there are very helpful reference books and online materials that aide in this type of work. As an example, Kaggle provides a good document for people interested in using Python for Data Science.
- There are ports in Python that allow libraries from other statistical and math modeling languages such as R to be utilized. Check out rpy2 at: http://rpy.sourceforge.net/.
- Python is interpreted so you can do a lot of analysis without developing a well-organized program. Analysts spend a lot of time in the investigation or data interrogation phase, this type of work leads to messy code. Developers can take a mess of code developed by a statistician or data scientist and organize it into a production system, without switching languages!
- The “Python Data Analysis Library” or pandas for short. According to http://pandas.pydata.org/, “pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.” I already mentioned Python libraries in point 1 however, pandas requires a special mention. Pandas is easy to use, can handle very large datasets, and is robust enough to solve all data wrangling problems you may encounter. As I eluded to previously, marketing data is oftentimes poorly formatted, pandas enables us to reliably format that data for reporting and analysis.
- Developers like Python. It is full-on object oriented, platform agnostic, and lends itself to very readable code. There are also very good tools for python development that are familiar to developers and analysts with varied backgrounds.
- Python can be set up on existing hardware regardless of platform and won’t cost you a dime to get started.