Data News and Articles

How I Got Here: Experimenting with Careers Led Jason Elszasz to Baltimore City's Chief Data Officer Role — As a career path, data science didn’t even have a degree associated with it back in 2003 when he was an undergrad at Case Western Reserve University in Cleveland, Ohio. Yet here he is, someone with no master’s in data science or degree in political science, now a dedicated civil servant in a field doing the work he loves. Tags: MD, Bmore, CDO
To Regulate AI, Try Playing in a Sandbox — There’s rising interest in using “regulatory sandboxes” to police AI without hamstringing innovation. Tags: AI, sandbox, development, policy, compliance
Building the Modern Data Team — Scalable, managed data warehouses have made it easy to get started with an analytical database, and tools like dbt have made it easy for analysts to manage the complexity of their business within a version-controlled environment using the power of SQL and templating languages. The number of job postings for analytic engineers continues to grow as companies demand more sophistication from their analysts. But while the architectural questions seem largely on their way to being answered, one question that remains open is how to build effective data teams. Tags: management, data, analytic engineers
What the Heck is a Data Mesh? — Data meshes have clearly struck a nerve. Some don’t understand them, while others believe they’re a bad idea. Yet, “Demystifying Data Mesh” and “Putting Data Mesh to Work” articles abound. Tags: data mesh
8 Lessons from 20 Years of Hype Cycles — As a VC at Icon Ventures and a twenty year veteran of productizing and marketing high tech for VMware, Netscape and others, I've always been fascinated by how new technologies emerge and come to market. One of the major artifacts that tries to capture the state of our market and industry each year is the annual Gartner Hype Cycle. Tags: management, IT, patterns
What, Exactly, Does a Data Product Manager Do? — Data product management (and technical product management) are in much the same place that data science was in 2014. People generally knew it was important. People generally understood that it had some different qualities than an analyst role. People described what it was at their company, creating a huge variety of “what is a data scientist” articles across the industry. Tags: management, analysis
Cascades in Machine Learning — Data is a foundational aspect of machine learning (ML) that can impact performance, fairness, robustness, and scalability of ML systems. Paradoxically, while building ML models is often highly prioritized, the work related to data itself is often the least prioritized aspect. This data work can require multiple roles (such as data collectors, annotators, and ML developers) and often involves multiple teams (such as database, legal, or licensing teams) to power a data infrastructure, which adds complexity to any data-related project. As such, the field of human-computer interaction (HCI), which is focused on making technology useful and usable for people, can help both to identify potential issues and to assess the impact on models when data-related work is not prioritized. Tags: ML, data, management
Data Mesh Principles and Logic — Our aspiration to augment and improve every aspect of business and life with data, demands a paradigm shift in how we manage data at scale. While the technology advances of the past decade have addressed the scale of volume of data and data processing compute, they have failed to address scale in other dimensions: changes in the data landscape, proliferation of sources of data, diversity of data use cases and users, and speed of response to change. Data mesh addresses these dimensions. Tags:data mesh, data, management
My Journey to Deep Learning — I find that deep learning gives me the best results for most problems I tackle, including solving problems that previously were out of reach. Furthermore, I find that deep learning generally requires less manual tweaking, leading to fewer errors and quicker results. Here, I discuss what I've learned on this journey, and describe why I believe nearly all data scientists should invest heavily in becoming effective deep learning practitioners. Tags: ML, deep learning
|
|
Data Tools and Resources

SqlFluff — SQLFluff is a dialect-flexible and configurable SQL linter. Designed with ELT applications in mind, SQLFluff also works with jinja templating and dbt. SQLFluff will auto-fix most linting errors, allowing you to focus your time on what matters. Tags: SQL, ELT
AutoGlon — AutoGluon automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models on text, image, and tabular data. Tags: ML, modeling
Vulture — Vulture finds unused code in Python programs. This is useful for cleaning up and finding errors in large code bases. If you run Vulture on both your library and test suite you can find untested code. Tags: Python, clean-up
Orchest — Orchest is a browser based IDE for Data Science. It integrates your favorite Data Science tools out of the box, so you don’t have to. The application is easy to use and can run on your laptop as well as on a large scale cloud cluster. Tags: IDE
Kobra — Kobra is a visual programming language for machine learning, built by data scientists and engineers to make ML easy to learn and experiment with. Tags: ML, programming
Flat Data — Flat explores how to make it easy to work with data in git and GitHub. It builds on the “git scraping” approach pioneered by Simon Willison to offer a simple pattern for bringing working datasets into your repositories and versioning them, because developing against local datasets is faster and easier than working with data over the wire. Tags: GitHub, developer tools
Observable — A dot plot visualizes a univariate (1D) distribution by showing each value as a dot and stacking dots that overlap. Dot positions are calculated by a “dot density” estimator that attempts to place dots close to their true values in order to better represent the distribution. The dot size is meaningful along both the x- and y-axes, such that the y-axis range and chart height are determined according to the dot diameter. Tags: visualization, data
|
|
How To's and Tutorials

Ace Your Next Data Science Interview — Get better at data science interviews by solving a few questions per week. Join thousands of other data scientists and analysts practicing for interviews! Tags: interview, skills, data science
A From-Scratch Tour of Bitcoin in Python — We are going to create, digitally sign, and broadcast a Bitcoin transaction in pure Python, from scratch, and with zero dependencies. In the process we’re going to learn quite a bit about how Bitcoin represents value. Tags: block chain, Python
Spark Learning — You can use this guide to learn about different components of Spark and use this as reference material. This section covers all the topics that should be enough for you to get started with Spark Theory. Tags: Spark
|
|
Opportunities
Share Your Project
Have you been working on a data project and are ready to share your methods, processes, or results? Contact us to get started.
Be a Do-Gooder
Are you looking for a way to get involved in the community and make an impact? Check out the volunteer opportunities with U.S. Digital Response.
Book Review Opportunity
Are you interested in reviewing an O'Reilly book for the publisher and sharing your views with the world? As if that isn't enough, you get to take a book home to enjoy as well. Send us an email and we'll get you started.
Data Analysis Volunteer Work to Support Baltimore City
Are you an expert with data and willing to mentor, or are you an up and coming hobbyist looking for a side project to work on? We have put together a group to focus on a few problems working with Baltimore City data and need your help. The current project focuses on data parsing and analysis for the Baltimore Board of Estimates. If interested, please send us an email or join us on Slack to discuss building a side project group.
Considering a Career Change?
Are you a software or system engineer, data scientist, analytic developer, or cybersecurity expert interested in learning about new opportunities?
Please send us an email to learn about the opportunities available with our partners.
Are You Hiring?
If your company is looking for data scientists, data engineers, software engineers, and other data related experts, please reach out so that we can help our members find new opportunities.
Please send us an email introducing your company and needs.
Get Involved with Data Works!
Want to be more involved in our data science community? If you have experience running workshops, hack-a-thons, curating newsletters, or are just interested in helping to grow the meetup, please send us an email!
Erias Ventures
Erias has an immediate need for Software Engineers, System Engineers, Test Engineers, Data Scientists, and System Administrators. External referral bonuses are available. For more information, please contact us at info@eriasventures.com.
|
|
|
|