Capturing value from advanced analytics in organisations, GPT-4 improved itself even further via Reflexion, new GPT-4 contender, Stanford published AI Index with top trends

...in addition to the importance of standard folder structure, LangChain framework, BloombergGPT, segment anything model by Meta

Apr 10, 2023

Welcome to my Effective AI newsletter, where I share my thoughts about AI, discuss latest digital news, as well as productivity and creativity tips for the AI community. If you missed my previous editions of this newsletter, you can check them out here. Additionally, you can visit my website for more of my articles, tips and tricks.

I also have this newsletter in the podcast form, so if you are interested in listening to it, please click on the player icon below.

🧠 What’s on my mind?

Key pillars to capture value out of advanced analytics in organisations

I’m often asked to give invited lectures at universities and topic that people often want to hear about is “How organisations can use advanced analytics (AA) for their competitive advantage?”. In this article I’d like to share the framework that I came up in 2016 and it stood the test of time.

This framework is 5 pillars that every organisation needs to adopt to realise value of data and advanced analytics: strategy, data, technology, people and process. Let me go through each one of them, and provide an intuition with making a road trip example:

Strategy - is a strategy where the company is going, its vision and a long term plan. If you are planning a road trip, it is the map and the route that you planning to take
Data - is the fuel for the car. We all know importance of data for advanced analytics and the recent advanced in AI show that no matter how good your algorithm and hardware is, it’s the data that is the main driver of progress
Technology - is an engine and a car itself. Tech allows us to take the data, process it, extract the insights and suggest actions. In organisations it is usually the end-to-end data architecture (eg. sources-data lake-end user systems and data governance), software and algorithms to process the data and serve the models and security.
People - is a car driver. The car will not move or will not get to the end point if the driver doesn’t have relevant skills. In AA organisations, data scientists, data engineers, translators, MLOps engineers and AI product people are the main drivers of value.
Process - is a set of rules on how to get to destination, as well as a GPS that constantly tries to find the best route. In AA organisation it is the ways of working (usually Agile), software engineering, MLOPs and data best practices, change management and culture.

Random find - LangChain - framework to build language model applications

This may not be new for people who use large language models all the time, as it has became quite a popular framework lately but this week I discovered LangChain for myself. It’s quite a young library, with first commit made in October 2022. The idea behind LangChain is to chain together different pre-built component to help with advanced use case development.

At it’s core, LangChain consists of several components:

Models - different existing model integrations
Prompts - prompt management and serialisation
Memory - persisting state between calls
Indexes - ways to structure documents so that LLMs can best interact with them
Chains - the pipeline
Agents - to decide what action to take or which model to call

I’m planning to play and explore more about LangChain in the next few weeks. Happy chaining!

🛠️ Practical tip - standard folder structure for every analytics project

One of the best low effort, high return tweaks that you can do for your analytics project is to set up a standard folder structure across all your projects and teams.

I tend to use directory structure from Python package kedro but any good similar structure would work as well. Here’s the structure of the kedro project.

project-dir         # Parent directory of the template
├── .gitignore      # Prevents staging of unnecessary files to `git`
├── conf            # Project configuration files
├── data            # Local project data (not committed to git)
├── docs            # Project documentation
├── logs            # Project output logs (not committed to git)
├── notebooks       # Project-related Jupyter notebooks 
├── pyproject.toml  # Identifies the project root and contains
|                     configuration information
├── README.md       # Project README
├── setup.cfg       # Configuration options for `pytest` when doing 
|                     `kedro test` and for the `isort` utility when 
|                      doing `kedro lint`
└── src             # Project source code

Standard directory structure has the following advantages:

Efficiency. An organized directory structure accelerates the development process by minimizing the time spent searching for specific files or resources.
Readability and maintainability: A logical and clear structure makes it easier for team members to understand the project's layout, components, and how they interact.
Standardisation. By adopting the same folder structure across all the projects, makes it easier to navigate new or old project for all team members.

🌏 What’s happening in the world of AI?

Reflexion - how models like GPT-4 can further improve themselves based on self-feedback

GPT-4 can beat itself by giving self-feedback. Source: https://github.com/GammaTauAI/reflexion-human-eval

Reflexion is self-feedback method that has been used on GPT-4 to improve its already great performance significantly. The idea is simple: use the model itself to critique its output and iteratively derive a better answer. This is very similar how humans evolve their thinking. Author of the method published code with some of the result comparisons.

This means that, while GPT-4 is quite powerful, we haven’t seen maximum power of it yet.

Meta released SAM - segment anything model, a model that can cut objects from any image

This week, Meta open-sourced and hosted their SAM - Segment Anything Model, along with data sets that were used to train the model. The interactive web UI can be found here.

As seen in the image, this model can segment any picture into relevant objects. It is also interactive, so human in the loop can add or remove points from the object.

You can read more detailed TLDR explanation in my article.

Bloomberg developed BloombergGPT, an AI model specifically designed to tackle financial tasks

Bloomberg published research paper about its large language model (LLM), called BloombergGPT. It was trained on proprietary financial data and contains 50 billion parameters. The model significantly outperforms existing models on financial tasks and has good performance on general LLM benchmarks.

Bloomberg will use this model for sentiment analysis, named entity recognition, news classification, and question answering, among others.

Vicuna, open source chat bot model, performs at 90% of ChatGPT quality

You can find Vicuna at https://github.com/lm-sys/FastChat.

My friend Shaan posted a more detailed overview of the model in his latest Let’s Talk Text newsletter.

Stanford published its 2023 AI Index with the latest achievements and trends in AI

The full report is almost 400 pages long but here’s the TLDR version:

Industry races ahead of academia. Until 2014 best machine learning models were produced in academia but since then industry took the lead. New AI systems get MILLION times more data than 10 years ago
AI systems improve by year-over-year improvement on many benchmarks is marginal
AI is getting increasingly bigger footprint on environment. Modern giant AI models require massive infrastructure, expensive, long to train and this results in a much higher carbon emissions
AI systems assists and speeds up scientific progress when used by researchers. Examples include hydrogen fusion, drug discovery
Cases of AI misuse is on rise
Year over year investment in AI decreased, after many years of consistent increase
Proportion of companies adopting AI plateaued but companies that adopted AI continue to decrease costs and increase revenues. The number of companies adopting AI doubled from 2017.

$25M investment of OpenAI into humanoid robot company 1X

1X Tech posted this announcement in their Medium post. For me, a little bit futuristic to see combination of GPT-4 and humanoid robots.

📊 Quick feedback

I love feedback! I’d really appreciate if you write me a short note how I can improve this newsletter, so it adds more value to you!

🏞️ My random photos of the week

In this section, I’ll be publishing some random photos from my life. This time, more photos from Melbourne, city where I live in.

Awesome design of the crosswalk in Melbourne, where tactile paving is also illuminated with the red or green colour, depending on the traffic light colour.

Small but always busy street of Melbourne, Centre place.

Parrots at the front of my house came to snack on sunflower seeds.

This article reflects my personal views and opinions only, which may be different from the companies and employers that I am associated with.