Ivan Vari

A minimalist Sysop/Devops Craftsman

Script to Clone SaltStack Formulas From GitHub

I am heavily into Salt infrastructure management at the moment, and wish to leverage all available (community written) formulas. Luckily, the SaltStack group maintains a collection of excellent formulas on their github page, and they are great source for states, ideas, best practices, etc. So I started cloning them, first the ones that I really needed. Then I realized later on, that some I may need in the near future so why not clone all of them and ensure I have a local copy of them for my development.

The pages have been updated fairly regularly, more and more people contributing now to the project, which is great however it started to become tedious to find new formulas and I needed an automated solution to keep up to date with the changes.

Script to Clone SaltStack Formulas from GitHub

I could not find anything off-the-shelf, hence I had to come up with my own solution using python.

salt_github_formulas.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
#!/usr/bin/env python
"""
    Script to find github hosted SaltStack formulas and make a up-to-date local copies of them.
"""

import urllib2
import re
import os
import subprocess


__author__ = "Ivan Vari""
__credits__ = ["Oliver Drake"]
__license__ = "GPLv3"
__version__ = None
__maintainer__ = "Ivan Vari"


URL = "https://github.com/saltstack-formulas"
PATTERN = 'a href="(/saltstack.*)"\s'
HOME = '{0}/development/saltstack-formulas'.format(os.environ['HOME'])


def main():
    """
    Main function and control flow.
    """

    # print strings with color
    # http://pythonhosted.org/ANSIColors-balises/ANSIColors.html
    colorgrn = "\033[01;32m{0}\033[00m"
    colorwht = "\033[1;37m{0}\033[00m"

    # margin for aligned printing
    width = 50

    # default loop controls
    lastpage = 1
    page = 1

    # until we reach lastpage
    while lastpage >= page:
        page_url = URL + '?page={0}'.format(page)

        connection = urllib2.Request(page_url)
        response = urllib2.urlopen(connection)

        # find all formula-references on the current page
        for match in re.finditer(PATTERN, response.read()):
            url = 'https://github.com' + match.group(1)

            # if 1st page, extract last page id and update loop controls
            if page == 1 and '?page=' in url:
                ref = re.findall('page=\d*', url)
                lastpage = int(max(ref).split('=')[1])

            # then ignore the page id completely
            if '?page=' not in url:
                # extract the formula name from the URL
                formula = url.split('/')[-1]
                repocopy = os.path.join(HOME, formula)

                # default assumes cloning
                cmd = 'git clone {0} {1}'.format(url, repocopy)
                exec_dir = '/'

                # if local copy found, change exec_dir and method to pull
                if os.path.exists(repocopy):
                    cmd = 'git pull'.format(repocopy)
                    exec_dir = repocopy

                process = subprocess.Popen(cmd.split(),
                                           stdout=subprocess.PIPE,
                                           stderr=subprocess.STDOUT,
                                           cwd=exec_dir)

                # default output definition
                result = '[OK]'
                flag = ''

                # override output definition based on git output message
                for line in process.stdout.readlines():
                    if 'Cloning' in line:
                        result = '[NEW]'
                        flag = '*'
                    elif 'Updating' in line:
                        result = '[UP]'
                        flag = '+'

                margin = width - (len(result) + len(flag))

                print '[{0}]: '.format(formula).ljust(margin, ' ') + '=>' + \
                      colorwht.format(flag) + colorgrn.format(result)
        page += 1


# boilerplate
if __name__ == "__main__":
    main()

This essentially parses the HTML of the front page and works out the available page numbers then fetches each page individually, filters out all formula references and runs git clone/pull over them depending whether we have local copy of it or not.

Comments