Posts tagged "open-science":
AI assistants in Emacs. Don't use ChatGPT. Help Open Science.
Everybody seems to be very excited about generative AI models these days. The availability of the Large Language Models (LLMs) through conversational interfaces like ChatGPT and of image generation tools like Dall-E or Stable Diffusion have brought generative AI to the masses. Although Github Copilot, also based on the GPT family of models, has been available for a while now, this was a niche tool, for programmers only.
Of course, Emacs being the best text-based interface to a computer, it is also the best interface to the generative AI models which are driven through textual prompts. Everything being a text buffer in Emacs, sending and receiving text via an API is straightforward (if you know Emacs Lisp).
It is therefore not a surprise that there are many Emacs packages allowing to use ChatGPT and Copilot. Just to list a few1:
Recently, David Wilson at System Crafters did a live stream showing some of the capabilities of these packages. To be honest, I was not impressed by the results of the code generated by the LLMs, but I can understand that many people may find them useful.
At the end of the stream, David wanted to address the question of the problems and issues with using these models. Of course, being a programmer, David likes recursion, and he asked ChatGPT about that. The LLM answered, as usual, with a balanced view with pros and cons. It is also usual for these models, in my opinion, to give awfully banal answers. There can be legal issues, copyright ones (in both senses, that is, that the models are trained with copyrighted material, and that the copyright of their outputs is not well defined), ethical problems, etc.
As always, the Silicon Valley tech firms have not waited for these issues to be settled before deploying the technology to the public. They impose their vision regardless of the consequences. Unfortunately, most programmers like the tech so much that they don't stop thinking before adopting and participating in spreading it.
Emacs being one of the most important contributions of the Free Software community, it may be surprising that some of its users are so keen to ride the latest Trojan horse of technofeudalism. Fermin pointed out a similar kind of issue with the Language Server Protocol, but that was not a real problem, since LSP servers run locally and we have free software implementations of them. This is not the case for ChatGPT or Copilot. We only have an API that can be taken down at any moment. But worse than that, as pointed out, using these LLMs means that we are working in training them. Every time that David, in his demo, wrote "… the code is wrong because of …, can you fix it?", he was giving feedback for the reinforcement learning of ChatGPT.
So what can we do? Stop using these tools and loose our programming or writing jobs because we will be less productive than those that use them?
Maybe we can do what GNU hackers and other Free Software activists have always done: implement libre tools that are digital common goods that free and empower users. Building and training big AI models is very costly and may be much more difficult than building an OS kernel like Linux, GCC or Emacs, but fortunately, there are already alternatives to ChatGPT.
The best starting point could be helping the BigScience Project by using their Bloom model from Emacs to train it and improve it. Hints on how to install Bloom locally and use it with Python can be found in this blog post. There are also initiatives to use federated learning2 to improve Bloom. The BigCode project targets code generation, and is to BigScience what Copilot is to ChatGPT. You can play with it here. There is a VSCode plugin for their StarCoder model, but no Emacs package yet. Isn't that a shame?
BigScience is Open Science, while OpenAI's is not open, but proprietary technology built from common digital goods harvested on the internet. It shouldn’t be difficult for us, Emacs users, to choose who we want to help.