Git技法:.gitignore、移除暫存與復原修改

2022-05-23 21:11:39

1. .gitignore常見專案新增

1.1 .gitignore模板

.gitignore針對每個語言都有對應的模板,在GitHub建立專案時就可以選擇(你可以在GitHub提供的.gitignore模板大全中找到它)。如Python語言的.gitignore模板如下:

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

1.2 新增更多的.gitignore專案

但是這些往往是不夠的的。如我們在Mac系統下用VSCode開發,那麼常常還需要新增以下專案:

# IDE - VSCode
.vscode/

# OS generated files
.DS_Store

其中.vscode/表示忽略.vscode這個包含專案組態檔的隱藏目錄(注意是包括目錄一起忽略,這個和Linux下諸如cp test/ .這類命令的語意有區別,參加我的部落格《Linux:檔案解壓、複製和移動的若干坑》),.DS_Store表示忽略掉Mac作業系統下儲存目錄自定義屬性的隱藏檔案。

此外,我們再以機器學習相關的專案為例子,資料(放在data目錄下)和模型(放在model目錄下)通常異常巨大,我們並不想將它們放到專案資料夾下,因此我們可能傾向於新增如下的專案:

# data files
data/*

# model files
model/*

data/*model/*語意上表示忽視data目錄下所有檔案與model目錄下所有檔案及子目錄(不包括datamodel目錄本身)。但是我們會發現,實際上空的datamodel目錄並沒有成功git add到專案中

(base) orion-orion@MacBook-Pro Learn-Git % git add data                  
(base) orion-orion@MacBook-Pro Learn-Git % git add model                 
(base) orion-orion@MacBook-Pro Learn-Git % git status                    
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean

這是因為空目錄不會稱為Git版本控制系統跟蹤(track)。但是如果我們想儲存datamodel的目錄架構呢?很簡單,我們只需要在datamodel目錄下新增.gitkeep目錄即可,然後將在.gitignore檔案中對.gitkeep進行反選(即不忽視):

# data files
data/*
!data/.gitkeep

# model files
model/*
!model/.gitkeep

可以看到由於隱藏檔案的存在,現在空目錄能夠正常git add了:

(base) orion-orion@MacBook-Pro Learn-Git % git add data 
(base) orion-orion@MacBook-Pro Learn-Git % git add model
(base) orion-orion@MacBook-Pro Learn-Git % git status   
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   data/.gitkeep
        new file:   model/.gitkeep

但是需要注意,如果這樣寫就沒用:

# data files
data/
!data/.gitkeep

因為data/表示將data目錄本身也忽略了,Git根本就不會去檢視該目錄,以致.gitkeep檔案也就不起作用了。

額外提一下,如果我們僅僅希望忽略掉data目錄下的.csv檔案,可以這樣寫:

# data files
data/*.csv

2. 移除已暫存(staged)的檔案

2.1 關於跟蹤與暫存

在Git中,一個檔案可能在這三種區域中:工作目錄(Working Directory),暫存區(Staging Area,也稱索引index),Git倉庫(可視為一棵提交樹committed tree)。三者關係如下圖所示:

當我們將檔案新增到專案目錄中時,我們其實是在將其新增到工作目錄中。

一旦一個目錄或檔案被git add了一次,那麼它就會被跟蹤(track)並加入暫存區。此後再對其進行修改,Git會提醒你Changes not staged for commitmodified: README.md,需要再次執行git add將其暫存(staged):

(base) orion-orion@MacBook-Pro Learn-Git % echo "new version" > README.md 
(base) orion-orion@MacBook-Pro Learn-Git % git status
On branch main
Your branch is ahead of 'origin/main' by 2 commits.
  (use "git push" to publish your local commits)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   README.md

no changes added to commit (use "git add" and/or "git commit -a")

而檔案的所謂的未跟蹤(untracked)、未修改(unmodified)、已修改(modified)、已暫存(staged)四種狀態的關係如下所示:

2.2 清除已暫存的檔案

現在假設我們搞忘了編寫.gitignore,然後已經用了git add -Agit add .命令目錄下所有檔案及子目錄都暫存了(在Git 2.0中git add -Agit add .命令等效)。而其中有很大的紀錄檔檔案或一些諸如*.a的編譯檔案,我們如何將這些檔案從暫存區域移除以取消跟蹤呢?可以用git rm --cached命令完成此項工作,如:

git rm --cached README.md

注意要帶上選項--cached,而不僅僅是git rmgit rm除了從暫存區域移除外,還會將磁碟上的檔案也一起刪了。關於引數選項可以參見我的部落格《Linux:可執行程式的Shell傳參格式規範 》

使用該命令效果如下:

(base) orion-orion@MacBook-Pro Learn-Git % git rm --cached README.md 
rm 'README.md'
(base) orion-orion@MacBook-Pro Learn-Git % git status               
On branch main
Your branch is ahead of 'origin/main' by 2 commits.
  (use "git push" to publish your local commits)

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        deleted:    README.md

注意到Changes to be committed:deleted: README.md,這說明當我們使用git rm --cached並commit後, 相關的檔案還會被從committed tree中移除。如果我們只想移除出暫存區,可以使用下列命令:

 git reset HEAD README.md

該命令等同 git reset --mixed HEAD README.md(預設引數為--mixed,還有個引數為--hard,我們放在3.3節講)。使用後效果如下:

(base) orion-orion@MacBook-Pro Learn-Git % git reset HEAD *.md     
Unstaged changes after reset:
M       README.md
(base) orion-orion@MacBook-Pro Learn-Git % git status              
On branch main
Your branch is ahead of 'origin/main' by 2 commits.
  (use "git push" to publish your local commits)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   README.md

no changes added to commit (use "git add" and/or "git commit -a")

注意到Changes not staged for commit: modified: README.md。說明該命令只是將README.md移除暫存區,但是上次對README.md的commit還在(即復原最近的一次commit之後的變化)。

如果要遞迴地將當前目錄下的所有檔案及子目錄移除出暫存區(與commit tree),可以這樣寫:

git rm -r --cached . 

注意這個命令非常危險和暴力,一般還是建議指定具體的目錄或檔名。

3. 追加與復原git commit操作

3.1 commit歷史檢視

git log命令可以看到專案的git commit歷史:

(base) orion-orion@MacBook-Pro Learn-Git % git log
commit 37a35d36eaf8b56c9e7b719c3c7576f3251cee36 (HEAD -> main)
Author: orion-orion <[email protected]>
Date:   Mon May 23 14:15:21 2022 +0800

    modify .gitignore

commit ab7bf6e2c400c8d775cc3bc56928c7748c63c8f8
Author: orion-orion <[email protected]>
Date:   Mon May 23 10:08:08 2022 +0800

    add .gitignore

commit 146c68e12fd2aebed8b38dd5cf95621f800fe4aa (origin/main, origin/HEAD)
Author: 獵戶座 <[email protected]>
Date:   Sun May 22 09:48:22 2022 +0800

    Initial commit

預設不用任何引數的話,git log會按提交時間列出所有的更新,最近的更新排在最上面。 正如你所看到的,這個命令會列出每個提交的 SHA-1 校驗和、作者的名字和電子郵件地址(如果電子郵件名為<[email protected]>,說明你在GitHub中將郵件名設定為私有的了,需要去修改一下)、提交時間以及提交說明。

3.2 追加commit操作

現在我們又對.gitignore進行了修改。但是我們不想又commit一次,而想將其合併在最後一次的modify .gitignore裡,使commit記錄更為精簡。我們可以用以下命令:

(base) orion-orion@MacBook-Pro Learn-Git % git add .gitignore
(base) orion-orion@MacBook-Pro Learn-Git % git commit --amend

並在commit資訊的編輯介面寫入modify .gitignore

modify .gitignore

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Date:      Mon May 23 14:15:21 2022 +0800
#
# On branch main
# Your branch is ahead of 'origin/main' by 2 commits.
#   (use "git push" to publish your local commits)
#
# Changes to be committed:
#       modified:   .gitignore
#       new file:   data/.gitkeep
#       new file:   model/.gitkeep
#
# Changes not staged for commit:
#       modified:   README.md
#
                          
:wq!

可以看到總的commit記錄沒變,所顯示的最後一次commit記錄的時間也沒變,但新的修改已經追加進去了(SHA-1 校驗和發生了變化):

(base) orion-orion@MacBook-Pro Learn-Git % git log           
commit a0dfeff409494165bdff60c27b24fad2bc0ed0ad (HEAD -> main)
Author: orion-orion <[email protected]>
Date:   Mon May 23 14:15:21 2022 +0800

    modify .gitignore

commit ab7bf6e2c400c8d775cc3bc56928c7748c63c8f8
Author: orion-orion <[email protected]>
Date:   Mon May 23 10:08:08 2022 +0800

    add .gitignore

commit 146c68e12fd2aebed8b38dd5cf95621f800fe4aa (origin/main, origin/HEAD)
Author: 獵戶座 <[email protected]>
Date:   Sun May 22 09:48:22 2022 +0800

    Initial commit

3.3 復原git commit操作

現在我們想復原git commit的操作。我們回到git reset命令。不過現在我們需要使用git reset --hard方法:

(base) orion-orion@MacBook-Pro Learn-Git %  git reset --hard HEAD^1
HEAD is now at ab7bf6e add .gitignore
(base) orion-orion@MacBook-Pro Learn-Git % git log                 
commit ab7bf6e2c400c8d775cc3bc56928c7748c63c8f8 (HEAD -> main)
Author: orion-orion <[email protected]>
Date:   Mon May 23 10:08:08 2022 +0800

    add .gitignore

commit 146c68e12fd2aebed8b38dd5cf95621f800fe4aa (origin/main, origin/HEAD)
Author: 獵戶座 <[email protected]>
Date:   Sun May 22 09:48:22 2022 +0800

    Initial commit

命令中的HEAD^1意思為將commit記錄回退到上上次提交後的狀態,HEAD^2以此類推。
不過大家必須注意,--hard 標記是reset命令唯一的危險用法,它也是 Git 會真正地銷燬資料的僅有的幾個操作之一。 其他任何形式的reset呼叫都可以輕鬆撤消,但是--hard選項不能,因為它強制覆蓋了工作目錄中的檔案。

參考