Henry65 commited on
Commit
cdbc2dc
1 Parent(s): 038eba5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -7
README.md CHANGED
@@ -3,22 +3,26 @@ license: mit
3
  tags:
4
  - code-understanding
5
  - unixcoder
 
6
  ---
7
 
8
- # RepoSim
9
 
10
- An approach to compare semantic similarities between Python repositories.
11
 
12
  ## Model Details
13
 
14
- **RepoSim** is a pipeline used to create embeddings for specified Python repositories on GitHub. For each repository, it extracts and encodes all functions' source code and docstrings into embeddings, then average them to get the mean code embeddings and the mean docstring embeddings, which can be used to perform various tasks such as cosine similarity comparison.
 
 
 
15
 
16
  ### Model Description
17
 
18
- The model used by **RepoSim** is **UniXcoder** fine-tuned on [code search task](https://github.com/microsoft/CodeBERT/tree/master/UniXcoder/downstream-tasks/code-search), using the [AdvTest](https://github.com/microsoft/CodeXGLUE/tree/main/Text-Code/NL-code-search-Adv) dataset.
19
 
20
- - **Pipeline developed by:** [Lazyhope](https://huggingface.co/Lazyhope)
21
- - **Repository:** [RepoSim](https://github.com/RepoAnalysis/RepoSim)
22
  - **Model type:** **code understanding**
23
  - **Language(s):** **Python**
24
  - **License:** **MIT**
@@ -82,4 +86,4 @@ Many thanks to authors of the UniXcoder model and the AdvTest dataset, as well a
82
 
83
  ## Authors
84
  - **Zihao Li** (https://github.com/lazyhope)
85
- - **Rosa Filgueira** (https://www.rosafilgueira.com)
 
3
  tags:
4
  - code-understanding
5
  - unixcoder
6
+ pipeline_tag: feature-extraction
7
  ---
8
 
9
+ # RepoSim4Py
10
 
11
+ An embedding-approach-based tool for comparing semantic similarities between different Python repositories by using different information from repositories.
12
 
13
  ## Model Details
14
 
15
+ **RepoSim4Py** is a pipeline based on the HuggingFace platform for generating embeddings according to specified Github Python repositories.
16
+ For each Python repository, it generates embeddings at different levels based on the source code, code documentation, requirements, and README files within the repository.
17
+ By taking the mean of these embeddings, a repository-level mean embedding is generated.
18
+ These embeddings can be used to compute semantic similarities at different levels, for example, using cosine similarity to get comparison.
19
 
20
  ### Model Description
21
 
22
+ The model used by **RepoSim4Py** is **UniXcoder** fine-tuned on [code search task](https://github.com/microsoft/CodeBERT/tree/master/UniXcoder/downstream-tasks/code-search), using the [AdvTest](https://github.com/microsoft/CodeXGLUE/tree/main/Text-Code/NL-code-search-Adv) dataset.
23
 
24
+ - **Pipeline developed by:** [Henry65](https://huggingface.co/Henry65)
25
+ - **Repository:** [RepoSim4Py](https://github.com/RepoMining/RepoSim4Py)
26
  - **Model type:** **code understanding**
27
  - **Language(s):** **Python**
28
  - **License:** **MIT**
 
86
 
87
  ## Authors
88
  - **Zihao Li** (https://github.com/lazyhope)
89
+ - **Rosa Filgueira** (https://www.rosafilgueira.com)