After RightsCon: What’s Next for CYRILLA?

Early Saturday morning at Columbia University’s Global Center in Tunis. June, 2019.

Work continues full-steam ahead with the CYRILLA Collaborative! Last time, we updated you on our inaugural strategic partners’ meeting, hosted at Harvard University Law School and the Berkman Klein Center back in January. Last month, we were fortunate to keep that momentum going at our second strategic partners’ meeting for the CYRILLA Collaborative, held immediately after RightsCon 2019 at Columbia University’s Global Center in Tunis. 

On June 15th, partners SMEX, Association for Progressive Communications, CIPIT, Columbia Global Freedom of Expression, Derechos Digitales, and HURIDOCS came together once again for a full day of continued collaboration. We not only picked up where we left off with some of the great work emerging from the previous meeting, but also explored some fascinating new topics with exciting implications for the CYRILLA digital rights legal database. We had a guest from GUARDINT, another initiative initiative working at the intersection of law and digital rights, present to the group about their work tracking surveillance-related case law in Europe, as well as learn more about our data model and taxonomy.

From developing the data model to incorporating machine learning capabilities, here are some of the highlights and key outcomes of our gathering:

Iterating the CYRILLA Data Model and Taxonomy

In Tunis, we reviewed and revisited the core component of the CYRILLA suite of open tools – the open data model and collaboratively produced taxonomy of digital rights topics. While the data model is in its final stages, we also discussed a few key issues, such as:

  • How can users to not only identify and filter courts by their specific names, but also to compare similar court types across jurisdictions? For now, we are adopting the Judicial Body filter list from the Columbia Global Freedom of Expression database, but will continue to think this question through.
  • Currently, individual laws in the database exist as singular entities – but what about digital rights-relevant amendments to these laws? As amendments are only sometimes, but not always, integrated into publicly available legislation, we decided they should be considered linked entities in order to maintain consistency across the data model.
The current draft of the CYRILLA data model. June, 2019.
  • As we come closer to finalizing the data model, we will circulate it for broader community input and review before implementing on CYRILLA and the databases of some of our partner organizations. Please reach out to if you would like to be part of the review process!

Incorporating Machine Learning into the CYRILLA Database

We were also eager to begin the process of incorporating machine learning capabilities into the CYRILLA database. Our technical partner HURIDOCS has taken the lead on this effort, with the support of grants they received from NESTA and Google. 

  • During RightsCon, in the days before our meeting, HURIDOCS Artificial Intelligence Specialist Natalie Widmann conducted a short study with a number of our partners. By analyzing their interactions with the documents from Columbia Global Freedom of Expression’s database, the study sought to surface patterns about how users of the platform locate and categorize documents.
  • At the meeting, Natalie shared the findings of her study with the rest of the group. Then, she briefed the group on her work at HURIDOCS and the different machine learning techniques that can improve CYRILLA’s usability. HURIDOCS intends to use the data from this study to begin implementing a semantic search function that will make it easier for CYRILLA users to locate and tag documents across different datasets.

Looking to the Future: Governance of the CYRILLA Collaborative

We’re almost a whole year into our grant for the CYRILLA Collaborative, and our time together in Tunis was a wonderful opportunity to reflect and take stock of everything we’ve been able to achieve together with our partners! 

  • To make sure that this work can continue sustainably, we continued earlier conversations from our last partners’ meeting about governance models for CYRILLA. The broad consensus was that CYRILLA should become a stand-alone entity with a board to oversee it. 
  • In support of this, over the coming months, we will be working together with our partners and the broader community to begin developing terms of reference for a CYRILLA governance board and its individual members, as well as visioning documents to chart the medium and long-term development goals of the Collaborative.
  • We also discussed exploring linking CYRILLA to a larger, potentially academic, entity, but all agreed that we did not want the Collaborative to lose its identity as a network of global south–based organizations. 

Over the coming months, we will be uploading a ton of legislation and case law from South Asia, Latin America, and sub-Saharan Africa. If you’d like to collaborate with us, we want to hear from you! Email to get in touch.

Becoming CYRILLA: A Quick Recap of the Past Year

Welcome! This is CYRILLA’s most recent update to the Digital Rights Law mailing list, but we wanted to share it with everyone following the project as well. If you are interested in the development of digital rights law, you can subscribe to the mailing list.

We’re excited to share with you some news on the significant progress we’ve made in the past year! We’ve now transformed our early Arab Digital Rights Datasets into the CYRILLA Collaborative, a global initiative to map and analyze legal frameworks for digitally networked spaces through open research methodologies, data models, taxonomies, and databases.

The CYRILLA Collaborative (CYRILLA stands for Cyberrights Research Initiative and Localized Legal Almanac) is a joint effort across a number of digital rights research and advocacy organizations. It seeks to make legislation and case law that affects human rights in digitally networked spaces more accessible to a wider range of actors, so they can more readily and confidently assess digital rights legal trends and their impacts. The core tools of the Collaborative are an online database (hosted on the HURIDOCS Uwazi platform) and suite of open tools, which can be adopted and adapted by any individual or organization grappling with questions involving the legal realization of digital rights:

  • A working definition of digital rights
  • A legal research methodology to help researchers locate laws that affect digital rights in existing and evolving legal frameworks.
  • An open data model on top of which developers will be able to build new applications, including those that pull and merge data from other similar datasets (in development).
  • A collaboratively produced taxonomy of digital rights topics (in development).
  • An open API

Get Involved!

CYRILLA is a community-supported, network-centric resource, for which we actively and enthusiastically seek broad participation and contribution. There are several ways you can get involved:

  1. Mapping the legal framework for digital rights in your country or across a specific issue area
  2. Alerting us to new digital rights law, case law, or related analysis in real time, by forwarding links to or tagging @cyrilla in a tweet
  3. Becoming a trusted contributor or peer reviewer of new law, case law, or analysis on
  4. Reviewing our developing digital rights issue taxonomy and/or data model
  5. Visualizing the data we have in new and interesting ways
  6. Developing new tools on top of our API
  7. Experimenting with our datasets using machine learning, natural language processing, or other techniques?

Whatever makes sense for your work, we want to explore how to get you involved! Email to get in touch, or follow us on Twitter at @cyrilla

The CYRILLA Collaborative used string and index cards to develop a draft of the data model at the partner meeting. February 2019.

Meanwhile, here are some highlights and updates from the past few months:

CYRILLA Partners’ Meeting

  • Over the course of three days, we set key thematic and semantic parameters for a digital rights legal taxonomy, mapped the foundation for the CYRILLA data model, created user stories for the design of the database’s user interface and taxonomy, and explored how to maximize collaborative synergies between partners.

Presentations, Panels and Workshops

  • In August 2018, incubating director and SMEX executive director Jessica Dheere presented CYRILLA at the Annenberg Oxford Media Policy Summer Institute, which coincided with the formal launch of the project. In November 2018, CYRILLA Collaborative partners met at the Internet Governance Forum in Paris, where SMEX’s session “Making National Laws Good for Internet Governance,” had been accepted into the program.
  • Finally, earlier this month, SMEX presented on the CYRILLA collaborative and database at the 2019 Internet Freedom Festival in Valencia, Spain, during which SMEX explained how to navigate the website through specific user stories, introduced the CYRILLA Collaborative, and explained how people could get involved (including joining this mailing list!).

In the Coming Months

  • The Association for Progressive Communications will begin to upload data from its Unshackling Expression report for South and Southeast Asia to the CYRILLA database;
  • CIPIT will expand and reformat its trademark database of Africa ICT policy to make it more interactive and searchable;
  • Derechos Digitales will follow suit and improve the data in its RedLatam database for the Latin America region;
  • Columbia Global Freedom of Expression will add an Arabic language database of seminal case law on free expression as well as cases from across the global south;
  • SMEX will continue to refine the CYRILLA database and add more caselaw in the Middle East and North Africa.
  • Likewise, HURIDOCS will continue working to make the Uwazi platform more amenable to the collaborative’s data.

Again, if you’d like to collaborate with us, we want to hear from you! Email to get in touch.

Use Case: Using CYRILLA for Global Partners Digital Encryption Map

Global Partners Digital (GPD), a UK-based social purpose company committed to protecting human rights in digital spaces, maps legislation that impacts the use of encryption technologies, highlighting the key articles that limit or restrict the use of these technologies. To help GPD expand its World map of encryption laws and policies to include countries in the Middle East and North Africa, SMEX used the CYRILLA database of global digital rights law to explore encryption laws in Algeria, Egypt, Iraq, Jordan, Lebanon, Morocco, Saudi Arabia, Syria, Tunisia, and the United Arab Emirates. 

The GPD research guidance asks researchers to assess the overall environment for encryption regulation in a country, analyzing six indicators: 

  1. the general encryption law,
  2. minimum or maximum encryption standards,
  3. licensing and registration requirements for encryption technologies,
  4. import and export controls,
  5. provider assistance provisions, and
  6. the power of the government to enforce decryption.

In most of the Arab League countries, relevant information about encryption legislation is not readily available in a single law, but spread across an array of laws, including telecommunications laws, anti-cybercrime laws, anti-terrorism laws, intellectual property laws, and, in some cases, stand-alone encryption laws. Moreover, depending on the country’s political system, these provisions do not always appear in independent statutes, but can also be found in amendments, regulations, and decrees. 

Navigating CYRILLA By Keyword Filter 

To find the relevant legislation, SMEX initially selected the “encryption law” keyword on the right toolbar to filter the stand-alone encryption laws and started the research with the two countries that had them: Morocco and Tunisia. SMEX also learned that many countries did not have a stand-alone encryption law. Instead, we expected, that the telecommunications laws, information crimes laws, and potentially other types of laws would have language that dealt with encryption technologies. Even in the countries that did have stand-alone laws, we suspected that other laws also contained provisions that affected encryption. Therefore, to ensure that our search was as comprehensive as possible, we expanded our search to include terms related to encryption and across a number of laws.

CYRILLA’s filter function, 2018.
The card for the Moroccan law that deals with encryption, October 2018.

Using Full-Text Search

To find the encryption provisions in laws that do not exclusively deal with encryption, SMEX used the platform’s full-text search function to find laws that contained the words “encryption,” “cryptography,” “decryption,” and “التشفير,” the Arabic word for “encryption.” From this search, SMEX was able to identify many more telecommunications laws, information crimes laws, and electronic transactions laws with articles that related to encryption.   

The first two results when you search “encryption” in the CYRILLA database, October 2018.

After taking these steps for countries with and without independent encryption laws, SMEX noticed that the report lacked information about the licensing, import, and export of encryption technologies. To remedy this problem, SMEX added the search terms “import,” “export, “licensing,” and “registration,” and their corresponding Arabic translations. This expanded search allowed SMEX to find articles that did not mention encryption directly, but still applied to the import and licensing of encryption technologies. For example, Article 44 of Egypt’s Telecommunications Law “prohibits the import, manufacture or assembly of any telecommunication equipment without a licence from the National Telecom Regulatory Authority.”   

Through CYRILLA, SMEX was able to locate the articles that permitted or restricted encryption and provide copies of the relevant laws to GPD in Arabic, French, and English. Most of the laws required for this research were readily available in CYRILLA, with the exception of a few specific data protection regulations in Tunisia, which researchers had not found in previous research. After locating the regulations online, SMEX added them to the database.

A search for “import” in CYRILLA, October 2018.


Throughout the process, there were a couple of structural challenges. Most notably, some of the PDFs are snapshots and not text-searchable; therefore, the search function only found laws that contained the terms in their metadata, which prevented SMEX from searching the full text of some laws. For these laws, SMEX had to spend more time reading through the law. While technology exists to make English language PDFs text-searchable, the same technology does not exist for Arabic language PDFs. HURIDOCS, the developer of Uwazi, the platform on which CYRILLA is built, is working to solve this problem. 

Keeping the laws in the database up-to-date remains a challenge as well. When SMEX identified a few of the missing data protection regulations in Tunisia, they also realized that the database did not have the most current amendments of a couple laws that these regulations referenced. Once SMEX came to this realization, they were able to find the most recent amendments online and add them to the database. Not only did finding the laws take additional time, but SMEX realized that some of their initial analysis needed to be revised. 

Key Takeaways 

As CYRILLA evolves, research projects and other practical use cases help us ensure that the information on the platform is up to date and encourage us to think more critically about the best way to present, sort, and contextualize the data. This project, in particular, pushed us to consider how CYRILLA should index and display keywords as we improve the current version. For example, some telecommunications laws contain importation and exportation requirements for all telecommunications-related equipment, which can include encryption technologies, but does that mean that this law should be categorized as an encryption law? Does the platform need a hierarchy for keywords? As a first step, the CYRILLA Collaborative will convene at the Berkman Klein Center at Harvard University this week to develop a draft digital rights law taxonomy that will improve the navigability and data structure of this database and perhaps others. 

If you would like to be notified about future use cases, contribute to the draft taxonomy, or otherwise get involved, or if CYRILLA has been helpful to your work or research in any way, please let us know at

ADRD at IFF: Learning Who Will Use the Datasets, and How

A persona created to represent a user of the Arab Digital Rights Datasets at IFF in March, 2017.
A persona created to represent a user of the Arab Digital Rights Datasets at IFF in March, 2017.

In early March, I and the Datasets’ legal adviser, Nani Jansen, led a session on the datasets at the Internet Freedom Festival (IFF), a weeklong gathering in Valencia, Spain, of digital rights advocates from all over the world. We used the session — Data Exploration Hackathon: Visualizing the Relationship between Rule of Law and Digital Rights in MENA and Beyond — to introduce phase 3 of the project and explore who might use this data and exactly how.

Among the very dedicated attendees of the session—who spent three hours with us in Taller 6, the smallest, stuffiest room at the otherwise muy cómodo Las Naves—were lawyers, journalists, advocacy directors and activists, human rights researchers and academics, as well as program officers from international agencies and donors.

This was the first public presentation of the Datasets since the previous year at IFF. We began by introducing the history of the project and how the methodology and categorization of laws has developed over the past several months, leading up to the current data collection phase, during which 13 legal and human rights researchers are identifying laws, regulations, draft laws, caselaw, and specific articles of interest related to digital rights in the legal frameworks of the 22 countries of the Arab League. We’ll be posting more about these processes here soon.

We spent the rest of the session in breakout groups gathering input on who our stakeholders are and what they want from such a dataset, by developing user personas and user stories. These outputs, commonly used by software and website developers to get a sense of who their users are, will ultimately help us develop the technical specifications for the technological interpretation of the dataset, which we expect to include both a simple website where users can conduct simple queries and perhaps a plan for an API.

For instance, one user persona/story went like this:

Basma, an independent activist, blogger is in her late twenties. Her first experiences in activism started in college. Basma writes about social and political issues on her blog and she has a dedicated following. She changes jobs frequently and has a small income from ads on her blog. Her political activities are a financial burden for he and she cannot afford unexpected expenses, such as fines or legal costs. Basma visits the Datasets frequently so she can stay up to date on laws that apply to her blog. She also finds data that she can use in her blogposts.

Another imagined Samya:

A freelance outreach coordinator on Internet freedom issues for international audience. She lives in France and once, when trying to communicate with her parents in Morocco, she realized that she couldn’t speak to them over VOIP. This promoted her to do background research on the issue, which she also does for outreach initiatives and campaigns she advises. She also needs to assess legal threats posed by her work and to her clients and their partners. She often needs to write situation assessments and other reports quickly, but must be sure that the information she’s citing is accurate, so as not to compromise her credibility or that of her clients. Also, if she can’t find the laws she needs, she must be able to explain why–so it’s crucial that she be able to assess how complete the Datasets are and how frequently they are updated.

Several journalist personas were also created, as follows:

I’m a professional, female, Arabic journalist working int he region and am a member of my national journalists’ syndicate. I need to know what are the current provisions of the laws so I can provide expert input into a government consultation/public hearing.

I’m a foreign freelance journalist (female, mid-20s) on a tight budget. I’m covering a story in Tunisia and I need to know the laws on defamation, freedom of expression, social media, etc., so that I can keep myself and my fixer safe.

I’m an experienced English-speaking journalist based in New York. I need to know which countries criminalize posting “false news” online, so that I can write an article about the dangers. If I can’t verify my information beyond a shadow of a doubt, my editor won’t run the story. Plus, I need examples of individuals who have been prosecuted under these laws. Oh, and I’m on a very tight deadline.

Another group developed a persona for a researcher, approaching the Datasets from an academic’s perspective:

Leila, a researcher investigating the state of digital rights across the MENA region, wants to conduct comparative research and longitudinal research, and to be able to correlate her findings with external themes. Specifically, she wants to know how the political changes of 2011 changed government attitudes towards the right to privacy in MENA countries, looking at the period from 2006 to 2016.

Finally, the last group imagined a policy analyst at a foreign ministry, an advocate/funder at an international media development organization, and a technologist/digital security expert. Here are there stories:

James, a technologist/digital security trainer needs an up-to-date reference source of locally verified information to pass on to his co-trainers in the field, so that they can do a pre-training assessment of the legality/risks/usefulness of various tools and practices, which will help them prioritize which topics to cover in the limited time they have.

Hannan, who develops partnerships with local organizations, wants an interactive, customizable index or map or database that will help her detect trends and even upload locally collected data to model/manipulate programmatic interventions. The data should be splice-able at national, regional, and global levels.

Giselle is a policy analyst at a foreign ministry that invests millions of dollars each year into internet freedom initiatives. She needs a queryable database of cyber-related laws so that she can look at trends and comparative data that can inform her critiques of flawed legislation and draft model language.

Many of these personas actually represented many of the people in the room. While this deviated somewhat from the typical aim of the user persona and user stories exercise, which is meant to get entrepreneurs, developers and technologists, away from building for what they think people need. But it’s hard to argue that these ideas don’t reflect the spectrum of needs we’re hoping that the Datasets serve.

At the same time, there are some perspectives that weren’t represented, such as that of lawyers—particularly human rights defenders—and activists, who we also think might find the Datasets useful for developing arguments in court or identifying problem areas for targeted policy reform. It was also suggested that we host a similar workshop (or series) back in Lebanon with only participants from the region, or only one kind of stakeholder, researchers, for example. This, it was suggested, would help us drill down even more into how this data can better benefit the primary communities it’s meant to serve.

Another hack that was suggested was to develop personas not as a subset of the stakeholder groups (journalists, activists, lawyers, etc.) but according to how they would use the data and/or their specific decision-making processes and workflows.

We’re planning to do that. But first, we’ll run a similar exercise at RightsCon next week, on Friday at noon, in the Demo Room. If you’re in Brussels, we’d love to see you there.



Datasets Featured in MEDMEDIA Projects Database

MedMedia is a cross-Mediterranean program, implemented by multiple stakeholders and designed to “complement ongoing campaigns to promote media freedoms and overcome the barriers to sectorial change.” As part of the initiative, a database of media development projects across the region has been created to help minimize overlap of efforts, among other aims. The Arab Digital Rights Datasets are included in this mapping and will be featured on the Med-MEDIA website in an upcoming blogpost.

EFF “Crime of Speech” Report References Datasets

The Electronic Frontier Foundation (EFF) has published the “Crime of Speech: How Arab Governments Use the Law to Silence Expression Online,” a new report by Wafa ben Hassine that looks at legal frameworks for online expression in the MENA region generally, and examines which kinds of laws are being used in four Arab-region countries to crackdown on online expression in particular. Ben Hassine completed the report during a six-month period as an Information Controls Fellow through the Open Technology Fund.

Among Ben Hassine’s key findings are

that law enforcement only applies them after it’s identified the journalist or protestor that it wants to arrest. The pattern is that authorities will find the offending speech and then choose the law that can be interpreted to most closely address it. The system results in a rule by law rather than rule of law: the goal is to arrest, try, and punish the individual—the law is merely a tool used to reach an already predetermined conviction.

The report relies heavily on the Arab Digital Rights Datasets and cross-references that data with “specific cases of arrest, detention, and imprisonment due to online activity, and where law enforcement targeted the individual under the guise of going after cybercrime or countering terrorism online.”

Like the legislative data, Ben Hassine’s data of arrests and detention is also openly accessible in CSV format.


A Webinar with on the Arab Digital Rights Datasets

Update August 10, 2016: Silk was purchased by Palantir and is no longer being maintained. We will be updating the dataset and porting over the data to a new online outpost in mid to late 2017.

In this webinar, I spoke with Jurian Bass and Sarah Aoun, from, a platform where people can easily upload and visualize their datasets. I talk about how SMEX created the open Arab Digital Rights Dataset and then used as a publishing platform to create, a multimedia research portal that “illuminates trends in how Arab governments are limiting digital rights, such as free expression and privacy online.”

Datasets Kick Off MENA Internet Policy Observatory Workshop in Istanbul

Earlier this month, the Annenberg School for Communication’s Internet Policy Observatory teamed up with Citizen Lab, ASL19, Social Media Exchange, 7iber, and Kadir Has University’s New Media Department to host an Internet Policy Research Methods Workshop focused on policy development in the MENA region. The program brought together young scholars and activists working in digital rights and the internet policy space in an intensive four-day practicum that provided a survey of both qualitative and quantitative, online and offline research methods with the goal of enhancing and advancing their advocacy efforts.

[pdf-embedder url=”” title=”Advancing Policy Advocacy for Digital Rights in the MENA Region”]


SMEX had the privilege of framing the workshop by outlining the current state of digital rights in the MENA region in our session “Advancing Policy Advocacy for Digital Rights in the MENA Region (embedded above).” Through recent research and the Arab Digital Rights Datasets, which we had recently visualized using the Silk.coo platform, we focused on emerging legal and social trends and how civil society and citizens-at-large are responding to them.

We also highlighted several recent advocacy initiatives, their successes and failures, and explored how the availability—or lack thereof—of Internet policy data enhances (or prevents) advocacy efforts to protect free expression and privacy online.

In advance of the workshop, we shared the following briefing materials with participants:

Experimenting with the Datasets at 2014

[pdf-embedder url=””]


With a greatly expanded dataset—including laws from 20 countries, nearly twice as many countries as the original dataset that covered six countries and Iran—SMEX participated in the inaugural workshop hosted by Small Media in London, last month.

The two-day workshop covered principles of data visualization and then gave teams comprising civil society organizations with datasets, designers, and coders a chance to play with their data and how to make it relevant to change processes.

This was the first time SMEX was able to see the ADRD in action. Our design and coding sprint culminated in the following presentation for a prototype (above). Clicking through the presentation will give you an idea of the kinds of questions we wanted to ask of the data, including:

  • Whether laws were passed more quickly in the wake of the Arab spring;
  • Ideas for how to cross-reference the legislation with other types of data, such as individual cases of detention and prosecution for alleged online speech crimes; and
  • An already well-established sense that since this is largely a user-generated dataset, that more work would need to be done on the methodology to make it a reliable source for research, reporting, and legal proceedings.

The Emerging Legal Framework for Free Expression Online in MENA

This paper surveys the emerging legal framework for online expression in the Arab region and is the foundation for a series of blogposts on the topic on the SMEX website. The research conducted for it, along with the initial data collection, originally spurred the idea of the datasets.

[pdf-embedder url=””]