Herein lie some of my thoughts and resources about neural networks. Because I work for a company that builds models for
computer vision, I have a bit of a professional bias towards image models, but I have tried to represent my
knowledge/opinions about a broader range of subjects here.
What do you think about generative "AI"?
tl;dr - mostly dancing bearware, some novel uses in responsibility laundering
Resources
Text models
For everything else
- Washington Post coverage of the data contained in the 'C4' dataset and how it influences the training of popular
large models. Also allows users to check if arbitrary URLs are part of the dataset. (NOTE: C4 is '''not''' the only
source of training text for the models being discussed, and the authors aren't doing a great job highlighting that,
but it should still be pretty representative)
- How well does ChatGPT speak Japanese? - an April 2023 evaluation of GPT-3.5 and GPT-4 performance on Japanese
language assessments. Also includes an interesting comparison of the number of tokens required to represent the
"Lord's Prayer" in multiple languages. I found the results of the latter particularly surprising.
Misc.
- I gave a talk on the fundamentals of neural networks to Boston Python in March 2023
- 3blue1brown has an excellent series of lessons about the fundamentals of neural networks. Particularly interesting
to me is the lesson on backpropagation for its excellent visualization of the process of adjusting neural network
weights.
Dumping ground
These references are totally unclassified
Writings by others
Academic works
- Using GitHub Copilot for Test Generation in Python: An Empirical Study - we find that 45.28% of test
generated...are passing tests, containing no syntax or runtime errors. The majority (54.72%) of generated tests...are
failing, broken, or empty tests. We observe that tests generated within an existing test code context often mimic
existing test methods
- Scalable Extraction of Training Data from (Production) Language Models - Using only $200 USD worth of queries to
ChatGPT (gpt-3.5-turbo), we are able to extract over 10,000 unique verbatim-memorized training examples. Our
extrapolation to larger budgets (see below) suggests that dedicated adversaries could extract far more data…we estimate
the…memorization of ChatGPT…[at] a gigabyte of training data. In practice we expect it is likely even higher.
- Does GPT-4 Pass the Turing Test?
- "The Fallacy of AI Functionality" - "...fear of misspecified objectives, runaway feedback loops, and AI alignment
presumes the existence of an industry that can get AI systems to execute on any clearly declared objectives, and that
the main challenge is to choose and design an appropriate goal. Needless to say, if one thinks the danger of AI is that
it will work too well, it is a necessary precondition that it works at all."
- "Adversarial Reprogramming of Neural Networks" - "In each [of six cases], we reprogrammed the [classification]
network [trained on ImageNet] to perform three different adversarial tasks: counting squares, MNIST classification,
and CIFAR-10 classification… Our finding…[suggests] that the reprogramming across domains is likely [possible]."
- "Universal and Transferable Adversarial Attacks on Aligned Language Models" - "For Harmful Behaviors, our approach
achieves an attack success rate of 100% on Vicuna-7B and 88% on Llama-2-7B-Chat… we find that the adversarial examples
also transfer to Pythia, Falcon, Guanaco, and surprisingly, to GPT-3.5 (87.9%) and GPT-4 (53.6%), PaLM-2 (66%), and
Claude-2 (2.1%)."
- "Mathematical Capabilities of ChatGPT" - in which ChatGPT and GPT4 largely fail to muster passing performance on a
mathematical problem set, compared to a domain-specific model that achieves nearly 100% performance.
- "Unmasking Clever Hans predictors and assessing what machines really learn" - "...it is important to comprehend
the decision-making process itself...transparency of the what and why in a decision of a nonlinear machine becomes
very effective for the essential task of judging whether the learned strategy is valid and generalizable or whether the
model has based its decision on a spurious correlation in the training data"
- "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜" - "LMs with extremely large numbers of
parameters model their training data very closely and can be prompted to output specific information from that
training data"
- "Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions" - "In total, we produce 89
different scenarios for Copilot to complete, producing 1,689 programs. Of these, we found approximately 40% to be
vulnerable."
- "Do Users Write More Insecure Code with AI Assistants?" - "We observed that participants who had access to
[codex-davinci-002] were more likely to introduce security vulnerabilities for the majority of programming tasks, yet
also more likely to rate their insecure answers as secure compared to those in our control group."
- "ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models" - "Over 90% of 1008
generated jokes were the same 25 Jokes."
- "How is ChatGPT's behavior changing over time?" - "We find that the performance and behavior of both GPT-3.5 and
GPT-4 can vary greatly over time."
- "Are Emergent Abilities of Large Language Models a Mirage?" - "For a fixed task and a fixed model family, the
researcher can choose a metric to create an emergent ability or choose a metric to ablate an emergent ability. Ergo,
emergent abilities may be creations of the researcher’s choices, not a fundamental property of the model family on the
specific task"
- "Extracting Training Data from Large Language Models" - "We demonstrate our attack on GPT-2, a language model
trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the
model's training data...we find that larger models are more vulnerable than smaller models."
- "Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4" - "We find that these models have memorized books,
both in the public domain and in copyright, and the capacity for memorization is tied to a book’s overall popularity
on the web. This differential in memorization leads to differential in performance for downstream tasks, with better
performance on popular books than on those not seen on the web"
- "Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering
Questions" - "Our user study results show that users prefer ChatGPT answers 34.82% of the time. However, 77.27% of
these preferences are incorrect answers"
Lawsuits
The legal status of generative models and their implications for intellectual
property in the US is something I'm trying to keep an eye on. The cases given
below are of particular interest to me.
The New York Times Company v. MICROSOFT CORPORATION
Andersen v. Stability AI Ltd.
Getty Images (US), Inc. v. Stability AI, Inc.
Silverman v. OpenAI, Inc.
Authors Guild v. OpenAI Inc.
Sancton v. OpenAI Inc. et al
Mata v. Avianca, Inc. (closed)
Note: this case is not about machine learning textually, but is included in
this list because it is a notable example of gross misuse of a language
model by plaintiff's counsel to submit falsified documents to the court. This
led to censure of plaintiff's counsel and dismissal of the case.