Learn from Software Engineers and Discover the Joy of ‘Worse is Better’ Thinkingsource: unsplash.comRecently, I have had the fortune of speaking to a number of data engineers and data architects about the problems they face with data in their businesses. The main pain points I heard time and time again were:Not knowing why something brokeGetting burnt with high cloud compute costsTaking too long to build data solutions/complete data projectsNeeding expertise on many tools and technologiesThese problems aren’t new. I’ve experienced them, you’ve probably experienced them. Yet, we can’t seem to find a solution that solves all of these issues in the long run. You […]
Moving data around can be slow. Here’s how you can squeeze every bit of performance optimization out of Python.Continue reading on Towards Data Science »
Photo by Zdeněk Macháček on UnsplashData modeling is a process of creating a conceptual representation of the data and its relationships within an organization or system. Dimensional modeling is an advanced technique that attempts to present data in a way that is intuitive and understandable for any user. It also allows for high-performance access, flexibility, and scalability to accommodate changes in business needs.In this article, I will provide an in-depth overview of data modeling, with a specific focus on Kimball’s methodology. Additionally, I will introduce other techniques used to present data in a user-friendly and intuitive manner. One particularly interesting technique for […]
Navigating the Latest GenAI Announcements — July 2024A guide to new models GPT-4o mini, Llama 3.1, Mistral NeMo 12B and other GenAI trendsImage Created by Author with GPT-4o to represent different modelsIntroductionSince the launch of ChatGPT in November 2022, it feels like almost every week there’s a new model, novel prompting approach, innovative agent framework, or other exciting GenAI breakthrough. July 2024 is no different: this month alone we’ve seen the release of Mistral Codestral Mamba, Mistral NeMo 12B, GPT-4o mini, and Llama 3.1 amongst others. These models bring significant enhancements to areas like inference speed, reasoning ability, coding ability, and tool calling performance […]
Whether you are a Data Engineer, Machine Learning Engineer or Web developer, you ought to get used to this toolHow the antic sun shines upon PydAntic users. Image by Vladimir Timofeev under license to Ilija Lazarevic.There are quite a few use cases where Pydantic fits almost seamlessly. Data processing, among others, benefits from using Pydantic as well. However, it can be used in web development for parsing and structuring data in expected formats.Today’s idea is to define a couple of pain points and show how Pydantic can be used. Let’s start with the most familiar use case, and that is data parsing […]
How to stop worrying and love the dataGenerated by the author using Midjourney Version 6Definition: eval (short for evaluation). A critical phase in a model’s development lifecycle. The process that helps a team understand if an AI model is actually doing what they want it to. The evaluation process applies to all types of models from basic classifiers to LLMs like ChatGPT. The term eval is also used to refer to the dataset or list of test cases used in the evaluation.Depending on the model, an eval may involve quantitative, qualitative, human-led assessments, or all of the above. Most evals I’ve encountered […]