From b85028522d6e924473159ba0da3543fc174d2ded Mon Sep 17 00:00:00 2001
From: Georgi Gerganov <ggerganov@gmail.com>
Date: Sat, 11 Mar 2023 00:09:19 +0200
Subject: [PATCH] Update README.md

---
 README.md | 86 +++++++++++++++++++++++++++++++++----------------------
 1 file changed, 52 insertions(+), 34 deletions(-)

diff --git a/README.md b/README.md
index 8103cf2..e5f5b53 100644
--- a/README.md
+++ b/README.md
@@ -17,11 +17,11 @@ The main goal is to run the model using 4-bit quantization on a MacBook.
 
 This was hacked in an evening - I have no idea if it works correctly.
 
-So far, I've tested just the 7B model and the generated text starts coherently, but typically degrades significanlty after ~30-40 tokens.
+So far, I've tested just the 7B model.
 Here is a "typical" run:
 
 ```java
-make -j && ./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128
+make -j && ./main -m ../LLaMA-4bit/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512
 I llama.cpp build info: 
 I UNAME_S:  Darwin
 I UNAME_P:  arm
@@ -32,33 +32,16 @@ I LDFLAGS:   -framework Accelerate
 I CC:       Apple clang version 14.0.0 (clang-1400.0.29.202)
 I CXX:      Apple clang version 14.0.0 (clang-1400.0.29.202)
 
-c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread main.cpp ggml.o utils.o -o main  -framework Accelerate
-./main -h
-usage: ./main [options]
-
-options:
-  -h, --help            show this help message and exit
-  -s SEED, --seed SEED  RNG seed (default: -1)
-  -t N, --threads N     number of threads to use during computation (default: 4)
-  -p PROMPT, --prompt PROMPT
-                        prompt to start generation with (default: random)
-  -n N, --n_predict N   number of tokens to predict (default: 128)
-  --top_k N             top-k sampling (default: 40)
-  --top_p N             top-p sampling (default: 0.9)
-  --temp N              temperature (default: 0.8)
-  -b N, --batch_size N  batch size for prompt processing (default: 8)
-  -m FNAME, --model FNAME
-                        model path (default: models/llama-7B/ggml-model.bin)
-
-main: seed = 1678476633
-llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
+make: Nothing to be done for `default'.
+main: seed = 1678486056
+llama_model_load: loading model from '../LLaMA-4bit/7B/ggml-model-q4_0.bin' - please wait ...
 llama_model_load: n_vocab = 32000
 llama_model_load: n_ctx   = 512
 llama_model_load: n_embd  = 4096
 llama_model_load: n_mult  = 256
 llama_model_load: n_head  = 32
 llama_model_load: n_layer = 32
-llama_model_load: n_rot   = 64
+llama_model_load: n_rot   = 128
 llama_model_load: f16     = 2
 llama_model_load: n_ff    = 11008
 llama_model_load: ggml ctx size = 4529.34 MB
@@ -66,24 +49,59 @@ llama_model_load: memory_size =   512.00 MB, n_mem = 16384
 llama_model_load: .................................... done
 llama_model_load: model size =  4017.27 MB / num tensors = 291
 
-main: prompt: 'If'
-main: number of tokens in prompt = 2
+main: prompt: 'Building a website can be done in 10 simple steps:'
+main: number of tokens in prompt = 15
      1 -> ''
-  3644 -> 'If'
+  8893 -> 'Build'
+   292 -> 'ing'
+   263 -> ' a'
+  4700 -> ' website'
+   508 -> ' can'
+   367 -> ' be'
+  2309 -> ' done'
+   297 -> ' in'
+ 29871 -> ' '
+ 29896 -> '1'
+ 29900 -> '0'
+  2560 -> ' simple'
+  6576 -> ' steps'
+ 29901 -> ':'
 
 sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000
 
 
-If you are a fan of the original Star Wars trilogy, then you'll want to see this.
-If you don't know your Star Wars lore, this will be a huge eye-opening and you will be a little confusing.
-Awesome movie. [end of text]
-
+Building a website can be done in 10 simple steps:
+1) Select a domain name and web hosting plan
+2) Complete a sitemap
+3) List your products
+4) Write product descriptions
+5) Create a user account
+6) Build the template
+7) Start building the website
+8) Advertise the website
+9) Provide email support
+10) Submit the website to search engines
+A website is a collection of web pages that are formatted with HTML. HTML is the code that defines what the website looks like and how it behaves.
+The HTML code is formatted into a template or a format. Once this is done, it is displayed on the user's browser.
+The web pages are stored in a web server. The web server is also called a host. When the website is accessed, it is retrieved from the server and displayed on the user's computer.
+A website is known as a website when it is hosted. This means that it is displayed on a host. The host is usually a web server.
+A website can be displayed on different browsers. The browsers are basically the software that renders the website on the user's screen.
+A website can also be viewed on different devices such as desktops, tablets and smartphones.
+Hence, to have a website displayed on a browser, the website must be hosted.
+A domain name is an address of a website. It is the name of the website.
+The website is known as a website when it is hosted. This means that it is displayed on a host. The host is usually a web server.
+A website can be displayed on different browsers. The browsers are basically the software that renders the website on the user’s screen.
+A website can also be viewed on different devices such as desktops, tablets and smartphones. Hence, to have a website displayed on a browser, the website must be hosted.
+A domain name is an address of a website. It is the name of the website.
+A website is an address of a website. It is a collection of web pages that are formatted with HTML. HTML is the code that defines what the website looks like and how it behaves.
+The HTML code is formatted into a template or a format. Once this is done, it is displayed on the user’s browser.
+A website is known as a website when it is hosted
 
 main: mem per token = 14434244 bytes
-main:     load time =  1313.77 ms
-main:   sample time =     6.17 ms
-main:  predict time =  3271.53 ms / 54.53 ms per token
-main:    total time =  4797.98 ms
+main:     load time =  1332.48 ms
+main:   sample time =  1081.40 ms
+main:  predict time = 31378.77 ms / 61.41 ms per token
+main:    total time = 34036.74 ms
 ```
 
 ## Usage